What's the precision of built-in functions in shading language? - hlsl

Precision conversion on mobile devices may cause performance drop. I want to minimize conversion in our shaders(glsles, hlsl and spir-v). I'm confused by the built-in function precision.
Consider the following code:
mediump float a, b;
mediump float c = max(a, b);
mediump float a, b;
float c = max(a, b);
mediump float a, b;
mediump float c = sin(max(a, b));
mediump float2 uv;
uniform mediump sampler2D tex;
mediump float c = texture2D(tex, uv);
What conversions would happen? If built-in function return type depends on parameters, code #1, #3, #4 should have no conversion. Is it right?

I don't think you can always predict conversions, because shader compilers will optimise to use the lowest precision and fewest conversions where possible. But I suspect you have the wrong mindset and are micro-optimising for something that doesn't matter much. With typical implementations, precision conversions should be very cheap. What you should be concerned about is unnecessarily using high precision for your inputs, outputs and other variables, which should have more impact.
The simplest and most optimal approach is to put precision mediump float; at the top of your shader, so precision defaults to mediump, then test on devices which support reduced precision (i.e. mobile GPUs) and see if anything breaks. You can then selectively add highp where necessary.

Related

Array issue in Vulkan Uniform [duplicate]

The vec3 type is a very nice type. It only takes up 3 floats, and I have data that only needs 3 floats. And I want to use one in a structure in a UBO and/or SSBO:
layout(std140) uniform UBO
{
vec4 data1;
vec3 data2;
float data3;
};
layout(std430) buffer SSBO
{
vec4 data1;
vec3 data2;
float data3;
};
Then, in my C or C++ code, I can do this to create matching data structures:
struct UBO
{
vector4 data1;
vector3 data2;
float data3;
};
struct SSBO
{
vector4 data1;
vector3 data2;
float data3;
};
Is this a good idea?
NO! Never do this!
When declaring UBOs/SSBOs, pretend that all 3-element vector types don't exist. This includes column-major matrices with 3 rows or row-major matrices with 3 columns. Pretend that the only types are scalars, 2, and 4 element vectors (and matrices). You will save yourself a very great deal of grief if you do so.
If you want the effect of a vec3 + a float, then you should pack it manually:
layout(std140) uniform UBO
{
vec4 data1;
vec4 data2and3;
};
Yes, you'll have to use data2and3.w to get the other value. Deal with it.
If you want arrays of vec3s, then make them arrays of vec4s. Same goes for matrices that use 3-element vectors. Just banish the entire concept of 3-element vectors from your SSBOs/UBOs; you'll be much better off in the long run.
There are two reasons why you should avoid vec3:
It won't do what C/C++ does
If you use std140 layout, then you will probably want to define data structures in C or C++ that match the definition in GLSL. That makes it easy to mix&match between the two. And std140 layout makes it at least possible to do this in most cases. But its layout rules don't match the usual layout rules for C and C++ compilers when it comes to vec3s.
Consider the following C++ definitions for a vec3 type:
struct vec3a { float a[3]; };
struct vec3f { float x, y, z; };
Both of these are perfectly legitimate types. The sizeof and layout of these types will match the size&layout that std140 requires. But it does not match the alignment behavior that std140 imposes.
Consider this:
//GLSL
layout(std140) uniform Block
{
vec3 a;
vec3 b;
} block;
//C++
struct Block_a
{
vec3a a;
vec3a b;
};
struct Block_f
{
vec3f a;
vec3f b;
};
On most C++ compilers, sizeof for both Block_a and Block_f will be 24. Which means that the offsetof b will be 12.
In std140 layout however, vec3 is always aligned to 4 words. And therefore, Block.b will have an offset of 16.
Now, you could try to fix that by using C++11's alignas functionality (or C11's similar _Alignas feature):
struct alignas(16) vec3a_16 { float a[3]; };
struct alignas(16) vec3f_16 { float x, y, z; };
struct Block_a
{
vec3a_16 a;
vec3a_16 b;
};
struct Block_f
{
vec3f_16 a;
vec3f_16 b;
};
If the compiler supports 16-byte alignment, this will work. Or at least, it will work in the case of Block_a and Block_f.
But it won't work in this case:
//GLSL
layout(std140) Block2
{
vec3 a;
float b;
} block2;
//C++
struct Block2_a
{
vec3a_16 a;
float b;
};
struct Block2_f
{
vec3f_16 a;
float b;
};
By the rules of std140, each vec3 must start on a 16-byte boundary. But vec3 does not consume 16 bytes of storage; it only consumes 12. And since float can start on a 4-byte boundary, a vec3 followed by a float will take up 16 bytes.
But the rules of C++ alignment don't allow such a thing. If a type is aligned to an X byte boundary, then using that type will consume a multiple of X bytes.
So matching std140's layout requires that you pick a type based on exactly where it is used. If it's followed by a float, you have to use vec3a; if it's followed by some type that is more than 4 byte aligned, you have to use vec3a_16.
Or you can just not use vec3s in your shaders and avoid all this added complexity.
Note that an alignas(8)-based vec2 will not have this problem. Nor will C/C++ structs&arrays using the proper alignment specifier (though arrays of smaller types have their own issues). This problem only occurs when using a naked vec3.
Implementation support is fuzzy
Even if you do everything right, implementations have been known to incorrectly implement vec3's oddball layout rules. Some implementations effectively impose C++ alignment rules to GLSL. So if you use a vec3, it treats it like C++ would treat a 16-byte aligned type. On these implementations, a vec3 followed by a float will work like a vec4 followed by a float.
Yes, it's the implementers' fault. But since you can't fix the implementation, you have to work around it. And the most reasonable way to do that is to just avoid vec3 altogether.
Note that, for Vulkan (and OpenGL using SPIR-V), the SDK's GLSL compiler gets this right, so you don't need to be worried about it for that.

OpenGl Sending 48 bytes in UniformBuffer but glsl expects 60 [duplicate]

The vec3 type is a very nice type. It only takes up 3 floats, and I have data that only needs 3 floats. And I want to use one in a structure in a UBO and/or SSBO:
layout(std140) uniform UBO
{
vec4 data1;
vec3 data2;
float data3;
};
layout(std430) buffer SSBO
{
vec4 data1;
vec3 data2;
float data3;
};
Then, in my C or C++ code, I can do this to create matching data structures:
struct UBO
{
vector4 data1;
vector3 data2;
float data3;
};
struct SSBO
{
vector4 data1;
vector3 data2;
float data3;
};
Is this a good idea?
NO! Never do this!
When declaring UBOs/SSBOs, pretend that all 3-element vector types don't exist. This includes column-major matrices with 3 rows or row-major matrices with 3 columns. Pretend that the only types are scalars, 2, and 4 element vectors (and matrices). You will save yourself a very great deal of grief if you do so.
If you want the effect of a vec3 + a float, then you should pack it manually:
layout(std140) uniform UBO
{
vec4 data1;
vec4 data2and3;
};
Yes, you'll have to use data2and3.w to get the other value. Deal with it.
If you want arrays of vec3s, then make them arrays of vec4s. Same goes for matrices that use 3-element vectors. Just banish the entire concept of 3-element vectors from your SSBOs/UBOs; you'll be much better off in the long run.
There are two reasons why you should avoid vec3:
It won't do what C/C++ does
If you use std140 layout, then you will probably want to define data structures in C or C++ that match the definition in GLSL. That makes it easy to mix&match between the two. And std140 layout makes it at least possible to do this in most cases. But its layout rules don't match the usual layout rules for C and C++ compilers when it comes to vec3s.
Consider the following C++ definitions for a vec3 type:
struct vec3a { float a[3]; };
struct vec3f { float x, y, z; };
Both of these are perfectly legitimate types. The sizeof and layout of these types will match the size&layout that std140 requires. But it does not match the alignment behavior that std140 imposes.
Consider this:
//GLSL
layout(std140) uniform Block
{
vec3 a;
vec3 b;
} block;
//C++
struct Block_a
{
vec3a a;
vec3a b;
};
struct Block_f
{
vec3f a;
vec3f b;
};
On most C++ compilers, sizeof for both Block_a and Block_f will be 24. Which means that the offsetof b will be 12.
In std140 layout however, vec3 is always aligned to 4 words. And therefore, Block.b will have an offset of 16.
Now, you could try to fix that by using C++11's alignas functionality (or C11's similar _Alignas feature):
struct alignas(16) vec3a_16 { float a[3]; };
struct alignas(16) vec3f_16 { float x, y, z; };
struct Block_a
{
vec3a_16 a;
vec3a_16 b;
};
struct Block_f
{
vec3f_16 a;
vec3f_16 b;
};
If the compiler supports 16-byte alignment, this will work. Or at least, it will work in the case of Block_a and Block_f.
But it won't work in this case:
//GLSL
layout(std140) Block2
{
vec3 a;
float b;
} block2;
//C++
struct Block2_a
{
vec3a_16 a;
float b;
};
struct Block2_f
{
vec3f_16 a;
float b;
};
By the rules of std140, each vec3 must start on a 16-byte boundary. But vec3 does not consume 16 bytes of storage; it only consumes 12. And since float can start on a 4-byte boundary, a vec3 followed by a float will take up 16 bytes.
But the rules of C++ alignment don't allow such a thing. If a type is aligned to an X byte boundary, then using that type will consume a multiple of X bytes.
So matching std140's layout requires that you pick a type based on exactly where it is used. If it's followed by a float, you have to use vec3a; if it's followed by some type that is more than 4 byte aligned, you have to use vec3a_16.
Or you can just not use vec3s in your shaders and avoid all this added complexity.
Note that an alignas(8)-based vec2 will not have this problem. Nor will C/C++ structs&arrays using the proper alignment specifier (though arrays of smaller types have their own issues). This problem only occurs when using a naked vec3.
Implementation support is fuzzy
Even if you do everything right, implementations have been known to incorrectly implement vec3's oddball layout rules. Some implementations effectively impose C++ alignment rules to GLSL. So if you use a vec3, it treats it like C++ would treat a 16-byte aligned type. On these implementations, a vec3 followed by a float will work like a vec4 followed by a float.
Yes, it's the implementers' fault. But since you can't fix the implementation, you have to work around it. And the most reasonable way to do that is to just avoid vec3 altogether.
Note that, for Vulkan (and OpenGL using SPIR-V), the SDK's GLSL compiler gets this right, so you don't need to be worried about it for that.

GLSL(330) modulo returns unexpected value

I am currently working with GLSL 330 and came across some odd behavior of the mod() function.
Im working under windows 8 with a Radeon HD 6470M. I can not recreate this behavior on my desktop PC which uses windows 7 and a GeForce GTX 260.
Here is my test code:
float testvalf = -126;
vec2 testval = vec2(-126, -126);
float modtest1 = mod(testvalf, 63.0); //returns 63
float modtest2 = mod(testval.x, 63.0); //returns 63
float modtest3 = mod(-126, 63.0); //returns 0
Edit:
Here are some more test results done after IceCools suggestion below.
int y = 63;
int inttestval = -126;
ivec2 intvectest(-126, -126);
float floattestval = -125.9;
float modtest4 = mod(inttestval, 63); //returns 63
float modtest5 = mod(intvectest, 63); //returns vec2(63.0, 63.0)
float modtest6 = mod(intvectest.x, 63); //returns 63
float modtest7 = mod(floor(floattestval), 63); //returns 63
float modtest8 = mod(inttestval, y); //returns 63
float modtest9 = mod(-126, y); //returns 63
I updated my drivers and tested again, same results. Once again not reproducable on the desktop.
According to the GLSL docs on mod the possible parameter combinations are (GenType, float) and (GenType, GenType) (no double, since we're < 4.0). Also the return type is forced to float but that shouldn't matter for this problem.
I don't know that if you did it on intention but -126 is an int not a float, and the code might not be doing what you expect.
By the way about the modulo:
Notice that 2 different functions are called:
The first two line:
float mod(float, float);
The last line:
int mod(int, float);
If I'm right mod is calculated like:
genType mod(genType x, float y){
return x - y*floor(x/y);
}
Now note, that if x/y evaluates -2.0 it will return 0, but if it evaluates as -2.00000001 then 63.0 will be returned. That difference is not impossible between int/float and float/float division.
So the reason is might be just the fact that you are using ints and floats mixed.
I think I have found the answer.
One thing I've been wrong about is that mangsl's keyword for genType doesn't mean a generic type, like in a c++ template.
GenType is shorthand for float, vec2, vec3, and vec4 (see link - ctrl+f genType).
Btw genType naming is like:
genType - floats
genDType - doubles
genIType - ints
genBType - bools
Which means that genType mod(genType, float) implies that there is no function like int mod(int, float).
All the code above have been calling float mod(float, float) (thankfully there is implicit typecast for function parameters, so mod(int, int) works too, but actually mod(float, float) is called).
Just as a proof:
int x = mod(-126, 63);
Doesn't compile: error C7011: implicit cast from "float" to "int"
It only doesn't work because it returns float, so it works like this:
float x = mod(-126, 63);
Therefore float mod(float, float) is called.
So we are back at the original problem:
float division is inaccurate
int to float cast is inaccurate
It shouldn't be a problem on most GPU, as floats are considered equal if the difference between them is less than 10^-5 (it may vary with hardware, but this is the case for my GPU). So floor(-2.0000001) is -2. Highp floats are far more accurate than this.
Therefore either you are not using highp floats (precision highp float; should fix it then) or your GPU has stricter limit for float equality, or some of the functions are returning less accurate value.
If all else fails try:
#extension BlackMagic : enable
Maybe some driver setting is forcing default float precision to be mediump.
If this happens, all your defined variables will be mediump, however, numbers typed in the code will still remain highp.
Consider this code:
precision mediump float;
float x = 0.4121551, y = 0.4121552;
x == y; // true
0.4121551 == 0.4121552; // false, as highp they still differ.
So that mod(-126,63.0) could be still precise enough to return the correct value, as its working with high precision floats, however if you give a variable (like at all the other cases), which will only be mediump, the function won't have enough precision to calculate the correct value, and as you look at your tests, this is what's happening:
All the functions that take at least one variable are not precise enough
The only function call that takes 2 typed numbers return the correct value.

What keywords GLSL introduce to C?

So we have in C:
auto if break int case long char register
continue return default short do sizeof
double static else struct entry switch extern
typedef float union for unsigned
goto while enum void const signed volatile
What new keywords OpenGL (ES) Shader Language provide to us?
I am new to GLSL and I want to create some highlight editing util for ease of use.
Math words included into GLSL will count as keywords..?
New ones (excluding the ones you listed above) according to the latest spec document:
attribute uniform varying
layout
centroid flat smooth noperspective
patch sample
subroutine
in out inout
invariant
discard
mat2 mat3 mat4 dmat2 dmat3 dmat4
mat2x2 mat2x3 mat2x4 dmat2x2 dmat2x3 dmat2x4
mat3x2 mat3x3 mat3x4 dmat3x2 dmat3x3 dmat3x4
mat4x2 mat4x3 mat4x4 dmat4x2 dmat4x3 dmat4x4
vec2 vec3 vec4 ivec2 ivec3 ivec4 bvec2 bvec3 bvec4 dvec2 dvec3 dvec4
uvec2 uvec3 uvec4
lowp mediump highp precision
sampler1D sampler2D sampler3D samplerCube
sampler1DShadow sampler2DShadow samplerCubeShadow
sampler1DArray sampler2DArray
sampler1DArrayShadow sampler2DArrayShadow
isampler1D isampler2D isampler3D isamplerCube
isampler1DArray isampler2DArray
usampler1D usampler2D usampler3D usamplerCube
usampler1DArray usampler2DArray
sampler2DRect sampler2DRectShadow isampler2DRect usampler2DRect
samplerBuffer isamplerBuffer usamplerBuffer
sampler2DMS isampler2DMS usampler2DMS
sampler2DMSArray isampler2DMSArray usampler2DMSArray
samplerCubeArray samplerCubeArrayShadow isamplerCubeArray usamplerCubeArray
Reserved for future use (will cause error at the moment):
common partition active
asm
class union enum typedef template this packed
goto
inline noinline volatile public static extern external interface
long short half fixed unsigned superp
input output
hvec2 hvec3 hvec4 fvec2 fvec3 fvec4
sampler3DRect
filter
image1D image2D image3D imageCube
iimage1D iimage2D iimage3D iimageCube
uimage1D uimage2D uimage3D uimageCube
image1DArray image2DArray
iimage1DArray iimage2DArray uimage1DArray uimage2DArray
image1DShadow image2DShadow
image1DArrayShadow image2DArrayShadow
imageBuffer iimageBuffer uimageBuffer
sizeof cast
namespace using
row_major
You probably want to get the OpenGL ES GLSL language specification. §3.6 lists the keywords (plus a number of reserved words that aren't keywords, but you're not supposed to use anyway, so they probably merit some sort of color coding as well).
Edit: Oops, I grabbed the wrong link there. My apologies. The current specs are:
OpenGL 4.1 GLSL
OpenGL ES 2.0 GLSL
Besides the above answers, there are also reserved identifiers, eg. in GLSL ES 3.0 spec:
Identifiers starting with gl_ are reserved for use by OpenGL ES, and may not be declared in a shader as either a variable or a function. It is an error to redeclare a variable, including those starting “gl_”.
And other things beyond these are reserved:
In addition, all identifiers containing two consecutive underscores (__) are reserved for use by underlying software layers. Defining such a name in a shader does not itself result in an error, but may result in unintended behaviors that stem from having multiple definitions of the same name.
It is relevant for syntax coloring and correctness checking.
Similarly, macros starting with GL_, and a bunch of things with __.

In OpenGL ES 2.0 / GLSL, where do you need precision specifiers?

Does the variable that you're stuffing values into dictate what precision you're working with, to the right of the equals sign?
For example, is there any difference, of meaning, to the precision specifier here:
gl_FragColor = lowp vec4(1);
Here's another example:
lowp float floaty = 1. * 2.;
floaty = lowp 1. * lowp 2.;
And if you take some floats, and create a vector or matrix from them, will that vector or matrix take on the precision of the values you stuff it with, or will those values transform into another precision level?
I think optimizing this would best answer the question:
dot(gl_LightSource[0].position.xyz, gl_NormalMatrix * gl_Normal)
I mean, does it need to go this far, if you want it as fast as possible, or is some of it useless?
lowp dot(lowp gl_LightSource[0].position.xyz, lowp gl_NormalMatrix * lowp gl_Normal)
I know you can define the default precision for float, and that this supposedly is used for vectors and matrices afterwards. Assume for the purpose of education, that we had defined this previously:
precision highp float;
You don't need precision specifiers on constants/literals since those get compile time evaluated to whatever they are being assigned to.
In vertex shaders, the following precisions are declared by default: ( 4.5.3 Default Precision Qualifiers)
precision highp float;
precision highp int;
precision lowp sampler2D;
precision lowp samplerCube;
And in fragment shaders you get:
precision mediump int;
precision lowp sampler2D;
precision lowp samplerCube;
This means that if you declare a float in a fragment shader, you have to say whether it is a lowp or a mediump. The default float/int precisions also extend to matrices/vectors.
highp is only supported on systems that have the GL_FRAGMENT_PRECISION_HIGH macro defined to 1; on the rest you'll get a compiler error. (4.5.4 Available Precision Qualifiers)
The rule for precision in an expression is that they get cast automatically to the type of the assignment / parameter they are bound to. So for your dot, it would use the precision of the input types by default and the additional lowp's are unnecessary (and syntactically incorrect). If you want to down-cast a type to a lower precision, the only way to do it is to explicitly assign it to a lower precision.
These answers are all from the Khronos GLSL spec, which you can find here (relevant sections are 4.5.2 and 4.5.3): https://www.khronos.org/registry/OpenGL/specs/es/2.0/GLSL_ES_Specification_1.00.pdf
In more recent versions of the spec, those sections are 4.7.3 and 4.7.4. Nothing has changed, except adding a default precision highp atomic_uint; to both lists. See https://www.khronos.org/registry/OpenGL/specs/es/3.2/GLSL_ES_Specification_3.20.pdf