The HLSL clip() function is described here.
I intend to use this for alpha cutoff, in OpenGL. Would the equivalent in GLSL simply be
if (gl_FragColor.a < cutoff)
{
discard;
}
Or is there some more efficient equivalent?
OpenGL has no such function. And it doesn't need one.
Or is there some more efficient equivalent?
The question assumes that this conditional statement is less efficient than calling HLSL's clip function. It's very possible that it's more efficient (though even then, it's a total micro-optimization). clip checks if the value is less than 0, and if it is, discards the fragment. But you're not testing against zero; you're testing against cutoff, which probably isn't 0. So, you must call clip like this (using GLSL-style): clip(gl_FragColor.a - cutoff)
If clip is not directly support by the hardware, then your call is equivalent to if(gl_FragColor.a - cutoff < 0) discard;. That's a math operation and a conditional test. That's slower than just a conditional test. And if it's not... the driver will almost certainly rearrange your code to do the conditional test that way.
The only way the conditional would be slower than clip is if the hardware had specific support for clip and that the driver is too stupid to turn if(gl_FragColor.a < cutoff) discard; into clip(gl_FragColor.a - cutoff). If the driver is that stupid, if it lacks that basic pinhole optimization, then you've got bigger performance problems than this to deal with.
In short: don't worry about it.
Related
I have been reading up on shadow mapping, and found the following tutorial:
http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-16-shadow-mapping/
It makes sense to me up until the point where the author starts discussing the "shadow acne" artifact. They explain the cause with the following diagram (with no words):
I am still having a lot of trouble understanding what actually causes shadow acne and why adding a bias fixes it.
It seems that the resolution of the shadow map has no effect on acne. What is it then? Maybe float precision, or is it something else?
Yes, it is a precision issue. Not really a float problem, just finite precision.
In theory the shadow map stores "distance to closest object from light". But in practice it stores "distance±eps from light".
Then when testing, you have your fragments distance to the same light. But again, in practice ± eps2. So if you compare those two values it turns out that eps varies differently when interpolating for shadow map rendering or shading. So if you compare d ± eps < d2 ± eps2, if d2==d, you might get the wrong result because eps!=eps2. But if you compare d ± eps < d2 + max(eps) + max(eps2) ± eps2 you will be fine.
In this example d2==d. That is called self shadowing. And can be easily fixed with the above bias, or by simply not testing against yourself in raytracing.
It gets much more tricky with different objects and when eps and eps2 are vastly different. One way to deal with it is to control eps (http://developer.download.nvidia.com/SDK/10.5/opengl/src/cascaded_shadow_maps/doc/cascaded_shadow_maps.pdf). Or one can just take a lot more samples.
To try to answer the question: The core issue is that shadow mapping compares ideal distances. But those distances are not ideal but quantized. And quantized values are usually fine, but in this case we are comparing them in two different spaces.
I've written my first couple of GLSL programs for Processing (a visual language similar to Java that can load shaders) recently that make fractals. In the loop that handles the fractal code, I have an escape conditional that breaks if a point would tend to infinity.
It works fine and it is similar to how I generally write the code for non-GLSL. However someone told me that two paths are calculated every time a conditional is executed. I've had a hard time finding exactly how much of a penalty is caused by conditionals in GLSL.
Edit: To the best of my understanding in non-GLSL when an if is encountered a path is assumed. If the "correct" path was assumed everything is great. If the "wrong" path was assumed then "bad" work is discarded and instructions continue along the "correct" path. The penalty might be say 3 (or whatever number) of instructions. I want to know if there is some number (3 or whatever) of instructions that are the penalty or if both paths are calculated all the way through.
Here is the code if the explanation is not clear enough:
// Mandelbrot Set code
int i = 0;
float zr = x;
float zi = y;
for (; i < maxIterations; i++) {
float sqZr = zr*zr;
float sqZi = zi*zi;
float twoZri = 2.0*zr*zi;
zr = sqZr-sqZi+x;
zi = twoZri+y;
if (sqZr+sqZi > 16.0) break;
}
On old GPUs, both sides of an if() clause were executed and the correct result chosen at the end. On newer ones, this is only the case if the compiler thinks it would be more efficient. if() clauses are not free: the generic rule of thumb I have used for some time is: "if() costs 14 clock cycles" though the latest GPUs may be cheaper.
Why is this so? Because GPUs are stream processors, they want to have identical data-loading profiles for all pixels (especially for gradient values like texture colors or values from vertex registers). The principle of SIMD -- even when the devices are not strictly SIMD -- is usually the way to get the most performance from such devices.
When in doubt, see if you can use one of the NVIDIA perf analysis tools on your code, or just try writing the code (it's short!) a few different ways and comparing your performance for your specific GPU.
(BTW Processing is not Java-like: it's Java)
My 9600GT hates me.
Fragment shader:
#version 130
uint aa[33] = uint[33](
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0
);
void main() {
int i=0;
int a=26;
for (i=0; i<a; i++) aa[i]=aa[i+1];
gl_FragColor=vec4(1.0,0.0,0.0,1.0);
}
If a=25 program runs at 3000 fps.
If a=26 program runs at 20 fps.
If size of aa <=32 issue doesn't appear.
Viewport size is 1000x1000.
Problem occurs only when the size of aa is >32.
Value of a as the threshold varies with the calls to the array inside the loop (aa[i]=aa[i+1]+aa[i-1] gives a different deadline).
I know gl_FragColor is deprecated. But that's not the issue.
My guess is that GLSL doesn't unroll automatically the loop if a>25 and size(aa)>32. Why. The reason why it depends on the size of the array is unknown to mankind.
A quite similar behavior explained here:
http://www.gamedev.net/topic/519511-glsl-for-loops/
Unwinding the loop manually does solve the issue (3000 fps), even if aa size is >32:
aa[0]=aa[1];
aa[1]=aa[2];
aa[2]=aa[3];
aa[3]=aa[4];
aa[4]=aa[5];
aa[5]=aa[6];
aa[6]=aa[7];
aa[7]=aa[8];
aa[8]=aa[9];
aa[9]=aa[10];
aa[10]=aa[11];
aa[11]=aa[12];
aa[12]=aa[13];
aa[13]=aa[14];
aa[14]=aa[15];
aa[15]=aa[16];
aa[16]=aa[17];
aa[17]=aa[18];
aa[18]=aa[19];
aa[19]=aa[20];
aa[20]=aa[21];
aa[21]=aa[22];
aa[22]=aa[23];
aa[23]=aa[24];
aa[24]=aa[25];
aa[25]=aa[26];
aa[26]=aa[27];
aa[27]=aa[28];
aa[28]=aa[29];
aa[29]=aa[30];
aa[30]=aa[31];
aa[31]=aa[32];
aa[32]=aa[33];
I am just putting in a summarizing answer of the comments here so this does not show up as unanswered anymore.
"#pragma optionNV (unroll all)"
fixes the immediate issue on nvidia.
In general though, GLSL compilers are very implementation dependent. The reason why there is a drop of at exactly 32 is easily explained by hitting a compiler heuristic like "don't unroll loops longer than 32". Also the huge speed difference might come from an unrolled loop using constants while a dynamic loop will require addressable array memory. Another reason could be that when unrolling dead code elimination an constant folding kicks in reducing the entire loop to nothing.
The most portable way to fix this is really manual unrolling, or even better manual constant folding. It is always questionable to compute constants in a fragment shader that can be computed outside. Some drivers might catch it for some cases, but it is better not to rely on that.
Every time I drunk browse so I see an unanswered fwidth question.
And it makes me wonder what it actually was designed to do.
Reading the docs it is:
abs(dFdx(p)) + abs(dFdy(p))
So it is not classic mip selection which is max(dx,dy).
Is it for alternative mip selection? But I fail to find a case where abs(dx) + abs(dy) would be better.
There must be some siggraph paper or common algorithm I am completely missing that uses that function. And it must be really popular because it made it into GLSL.
The only thing I can think of is some 2d post filter I am missing.
But what?
I am sure somebody here knows and once you see it it's obvious.
So: What algorithm uses abs(dx) + abs(dy)?
You're actually quite on the money with the 2D filtering suggestion. Any filter which relies on some sort of metric for the rate of change between a pixel and its neighbors could benefit from this function.
Examples would be anti-aliasing, edge detection, anisotropic filtering. I'm sure there are more examples one could think of.
It seems from your question and comments that you expect there to be a mind-blowing reason for this function to be included in GLSL. I would just say that it's a useful function to have. Perhaps someone with more in-depth knowledge about the actual internals of this function could provide more detail on what happens behind the scenes (i.e. if there is any performance improvement over a handwritten equivalent with dFdx and dFdy).
this is the total derivative for the function DF = dF/dx*dx + dF/dy*dy .see the similarity?
fWidth >= DF or in words fWidth is maximum possible change in a fragment variable F, between any of a current fragmants neighboring pixels. ie the 8 surrounding pixels in the 3x3 neighborhood.
Currently I am learning how to create shaders in GLSL for a game engine I am working on, and I have a question regarding the language which puzzles me. I have learned that in shader versions lower than 3.0 you cannot use uniform variables in the condition of a loop. For example the following code would not work in shader versions older than 3.0.
for (int i = 0; i < uNumLights; i++)
{
...............
}
But isn't it possible to replace this with a loop with a fixed amount of iterations, but containing a conditional statement which would break the loop if i, in this case, is greater than uNumLights?. Ex :
for (int i = 0; i < MAX_LIGHTS; i++)
{
if(i >= uNumLights)
break;
..............
}
Aren't these equivalent? Should the latter work in older versions GLSL? And if so, isn't this more efficient and easy to implement than other techniques that I have read about, like using a different version of the shader for different number of lights?
I know this might be a silly question, but I am a beginner and I cannot find a reason why this shouldn't work.
GLSL can be confusing insofar as for() suggests to you that there must be conditional branching, even when there isn't because the hardware is unable to do it at all (which applies to if() in the same way).
What really happens on pre-SM3 hardware is that the HAL inside your OpenGL implementation will completely unroll your loop, so there is actually no jump any more. And, this explains why it has difficulties doing so with non-constants.
While technically possible to do it with non-constants anyway, the implementation would have to recompile the shader every time you change that uniform, and it might run against the maximum instruction count if you're just allowed to supply any haphazard number.
That is a problem because... what then? That's a bad situation.
If you supply a too big constant, it will give you a "too many instructions" compiler error when you build the shader. Now, if you supply a silly number in an uniform, and the HAL thus has to produce new code and runs against this limit, what can OpenGL do?
You most probably validated your program after compiling and linking, and you most probably queried the shader info log, and OpenGL kept telling you that everything was fine. This is, in some way, a binding promise, it cannot just decide otherwise all of a sudden. Therefore, it must make sure that this situation cannot arise, and the only workable solution is to not allow uniforms in conditions on hardware generations that don't support dynamic branching.
Otherwise, there would need to be some form of validation inside glUniform that rejects bad values. However, since this depends on successful (or unsuccessful) shader recompilation, this would mean that it would have to run synchronously, which makes it a "no go" approach. Also, consider that GL_ARB_uniform_buffer_object is exposed on some SM2 hardware (for example GeForce FX), which means you could throw a buffer object with unpredictable content at OpenGL and still expect it to work somehow! The implementation would have to scan the buffer's memory for invalid values after you unmap it, which is insane.
Similar to a loop, an if() statement does not branch on SM2 hardware, even though it looks like it. Instead, it will calculate both branches and do a conditional move.
(I'm assuming you are talking about pixel shaders).
Second variant is going to work only on gpu which supports shader model >= 3. Because dynamic branching (such as putting variable uNumLights into IF condition) is not supported on gpu shader model < 3 either.
Here you can compare what is and isn't supported between different shader models.
There is a fun work around I just figured out. Seems stupid and I can't promise you that it's a healthy choice, but it appears to work for me right now:
Set your for loop to the maximum you allow. Put a condition inside the loop to skip over the heavy routines, if the count goes beyond your uniform value.
uniform int iterations;
for(int i=0; i<10; i++){
if(i<iterations){
//do your thing...
}
}