Shader's function parameters performance - opengl

I'm trying to understand how passing parameters is implemented in shader languages.
I've read several articles and documentation, but still I have some doubts. In particular I'm trying to understand the differences with a C++ function call, with a particular emphasis on performances.
There are slightly differences between HLSL,Cg and GLSL but I guess the underline implementation is quite similar.
What I've understood so far:
Unless otherwise specified a function parameter is always passed by value (is this true even for matrix?)
Passing by value in this context hasn't the same implications as with C++. No recursion is supported, so the stack isn't used and most function are inlined and arguments directly put into registers.
functions are often inlined by default (HLSL) or at least inline keyword is always respected by the compiler (Cg)
Are the considerations above right?
Now 2 specific question:
Passing a matrix as function parameter
inline float4 DoSomething(in Mat4x4 mat, in float3 vec)
{
...
}
Considering the function above, in C++ that would be nasty and would be definitely better to use references : const Mat4x4&.
What about shaders? Is this a bad approach? I read that for example inout qualifier could be used to pass a matrix by reference, but actually it implicates that matrix be modified by the called function..
Does the number (and type of arguments) have any implication? For example is better use functions with a limited set of arguments?Or avoid passing matrices?
Is inout modifier a valid way to improve performance here? If so, anyone does know how a typical compiler implement this?
Are there any difference between HLSL an GLSL on this?
Does anyone have hints on this?

According to the spec, values are always copied. For in parameters, the are copied at call time, for out parameters at return time, and for inout parameters at both call and return time.
In the language of the spec (GLSL 4.50, section 6.1.1 "Function Calling Conventions"):
All arguments are evaluated at call time, exactly once, in order, from left to right. Evaluation of an in parameter results in a value that is copied to the formal parameter. Evaluation of an out parameter results in an l-value that is used to copy out a value when the function returns. Evaluation of an inout parameter results in both a value and an l-value; the value is copied to the formal parameter at call time and the lvalue is used to copy out a value when the function returns.
An implementation is of course free to optimize anything it wants as long as the result is the same as it would be with the documented behavior. But I don't think you can expect it to work in any specify way.
For example, it wouldn't be save to pass all inout parameters by reference. Say if you had this code:
vec4 Foo(inout mat4 mat1, inout mat4 mat2) {
mat1 = mat4(0.0);
mat2 = mat4(1.0);
return mat1 * vec4(1.0);
}
mat4 myMat;
vec4 res = Foo(myMat, myMat);
The correct result for this is a vector containing all 0.0 components.
If the arguments were passed by reference, mat1 and mat2 inside Foo() would alias the same matrix. This means that the assignment to mat2 also changes the value of mat1, and the result is a vector with all 1.0 components. Which would be wrong.
This is of course a very artificial example, but the optimization has to be selective to work correctly in all cases.

Your first bullet point does not work when you consider arguments qualified using inout.
The real issue is what you do with the parameter inside the function, if you modify a parameter qualified with in then it cannot be "passed by reference" and a copy will have to be made. On modern hardware this probably is not a big deal, but Shader Model 2.0 was pretty limited in terms of number of temp registers and I ran into these kinds of issues more than once when GLSL and Cg first came out.
For reference, consider the following GLSL code:
vec4 DoSomething (mat4 mat, vec3 vec)
{
// Pretty straight forward, no temporary registers are required to pass arguments.
return vec4 (mat [0] + vec4 (vec, 0.0));
}
vec4 DoSomethingCopy (mat4 mat, vec3 vec)
{
mat [0][0] = 0.0; // This requires the compiler to make a local copy of mat
return vec4 (mat [0] + vec4 (vec, 0.0));
}
vec4 DoSomethingInOut (inout mat4 mat, in vec3 vec)
{
mat [0][0] = 0.0; // No copy required, but the original mat is modified
return vec4 (mat [0] + vec4 (vec, 0.0));
}
I cannot really comment on performance, my only bad experiences had to do with hitting actual hardware limits on older GPUs. Of course you should assume that any time something has to be copied it is going to negatively impact performance.

All shader functions are inlined (recursive function are forbidden). The concept of reference/pointer is invalid here too. The only case when some code will be generated is when you write on an input parameter. However, if the original registers aren't used anymore the compiler will probably use the same registers, and the copy (mov operation) won't be needed.
Bottom line: function invocation is free.

Related

Vulkan Array of Specialization Constants

Is is possible to have an array of specialization constants such that the glsl code looks similar to the following:
layout(constant_id = 0) const vec2 arr[2] = vec2[] (
vec2(2.0f, 2.0f),
vec2(4.0f, 4.0f)
);
or, alternatively:
layout(constant_id = 0) const float arr[4] = float[] (
2.0f, 2.0f,
4.0f, 4.0f
);
As far as I have read there is no limit to the number of specialization constants that can be used so it feels strange that it wouldn't be possible but when I attempt the above the SPIR-V compiler notifies me that 'constant_id' can only be applied to a scalar. Currently I am using a uniform buffer to provide the data but I would like to eliminate the backed buffer and the need to bind the buffer before drawing as well as allow the system to optimize the code during pipeline creation if its possible.
The shading languages (both Vulkan-GLSL and SPIR-V) makes something of a distinction between the definition of a specialization constant within the shader and the interface for specializing those constants. But they go about this process in different ways.
In both languages, the external interface to a specialization constant only works on scalar values. That is, though you can set multiple constants to values, the constants you're setting are each a single scalar.
SPIR-V allows you to declare a specialization constant which is a composite (array/vector/matrix). However, the components of this composite must be either specialization constants or constant values. If those components are scalar specialization constants, you can OpDecorate them with an ID, which the external code will access.
Vulkan (and OpenGL) GLSL go about this slightly differently from raw SPIR-V. In GLSL, a const-qualified value with a constant_id is a specialization constant. These must be scalars.
However, you can also have a const-qualified value that is initialized by values that are either constant expressions or specialization constants. You don't qualify these with a constant_id, but you built them from things that are so qualified:
layout(constant_id = 18) const int scX = 1;
layout(constant_id = 19) const int scZ = 1;
const vec3 scVec = vec3(scX, 1, scZ); // partially specialized vector
const-qualified values that are initialized from specialization constants are called "partially specialized". When this GLSL is converted into SPIR-V, these are converted into OpSpecConstantComposite values.

Clarification of GLSL Function Calling Conventions

I recently encountered some confusion while using a GLSL function which modified (and copied out) one of its input parameters. Let's suppose this is the function:
float v(inout uint idx) {
return 3.14 * idx++;
}
Now let's use that function in a potentially ambiguous way:
uint idx = 0;
const vec4 values = vec4(v(idx), v(idx), v(idx), v(idx));
We might reasonably assume that after the call to the vec4 constructor returns, our vector values should equal {0.00, 3.14, 6.28, 9.42} and idx should equal 4. However, it occured to me to wonder if the order of evaluation of function arguments in GLSL is well defined, and if so whether the above assumed ordering is correct. Alternatively, could this result in (implementation dependent) undefined behavior?
So of course I consulted the GLSL spec (v4.6, rev3, §6.1.1, p116, "Function Calling Conventions"), which has the following to say:
All arguments are evaluated at call time, exactly once, in order, from left to right.
So far so good. But then farther down the page:
The order in which output parameters are copied back to the caller is undefined.
I'm not entirely clear on the significance of this second statement.
Does it mean that for the function float doWork(inout uint v1, inout uint v2) {...} that the order in which v1 and v2 are copied back is undefined? This would matter if you did something like passing the same local variable in place of both parameters.
Alternatively, does it instead mean that in the earlier example, the order in which the variable idx is updated is undefined, and as such the ordering of values is also undefined?
Or perhaps both of these cases are undefined? That is, perhaps all copy-back operations on the entire line of code happen in an unordered manner?
It goes without saying that using multiple variables to hold the values prior to the vec4 constructor call would trivially avoid this question entirely, but that's not the point. Rather, I'd like to know how this part of the standard was meant to be interpreted and whether or not my first example would result in idx containing an undefined value.

Eigen::Ref in pass-by-pointer

Similar to question Pointer vs Reference difference when passing Eigen objects as arguments
Let's say we have foo1 and matrix mat2by2:
void foo1(MatrixXd& container){
//...container matrix is modified here
}
and
Matrix33d mat2by2;
mat2by2 << 1,2,
3,4;
After reading http://eigen.tuxfamily.org/dox/TopicFunctionTakingEigenTypes.html, it seems like a better alternative to foo1 may be:
void foo2(Ref<MatrixXd> container){
//...container matrix is modified here
}
If foo2's parameter is being passed as a reference,
what would be the equivalent to pass-by-pointer using the Eigen::Ref class??
void foo(Eigen::MatrixXd* container){
//...container matrix is modified here
}
I think the basic idea is not to use pointers or references directly. Eigen uses template expressions to represent calculations. This means the type changes depending on the expression used to calculate the matrix, and expressions are potentially carried around unevaluated.
If necessary Ref will evaluate the template expression into a temporary object matching the memory layout you requested to pass as an argument. If the memory layout of your argument matches the memory layout required by your parameter, Ref will act as a transparent reference.
Borrowing directly from the documentation: Your input parameters should be declared constant, while non-const parameters can be used as output parameters.
void cov(const Ref<const MatrixXf> x, Ref<MatrixXf> C)
{
...
C = ...; // Your return value here
}
If you read from and write to a matrix, the parameter should also obviously be non-const.
For optional parameters you could use a pointer to a Ref.
Edit: The documentation does note that you can use constant references directly to pass parameters. This only works because the compiler is happy to convert temporary objects to const-references. It will not work for pointers.

Why do MSVC optimizations break SSE code when function arguments are const refs to temporaries or temporaries copied by value?

Ran into this yesterday, I will try to give clear and simple examples which fail for me with MSVC12 (VS2013, 120) and MSVC14 (VS2015, 140). Everything is implicitly /arch:SSE+ with x64.
I will trivialize the issue to a simple matrix transpose example using defined macros _MM_TRANSPOSE4_PS for illustration purposes. This one is implemented in terms of shuffles, rather than moving L/H 8 byte blocks around.
float4x4 Transpose(const float4x4& m) {
matrix4x4 n = LoadMatrix(m);
_MM_TRANSPOSE4_PS(n.row[0], n.row[1], n.row[2], n.row[3]);
return StoreMatrix(n);
}
The matrix4x4 is merely a POD struct containing four __m128 members, everything is tidily aligned on a 16-byte boundary, even though it is somewhat implicit:
__declspec(align(16)) struct matrix4x4 {
__m128 row[4];
};
All of this fails on /O1, /O2 and /Ox:
// Doesn't work.
float4x4 resultsPlx = Transpose( GiveMeATemporary() );
// Changing Transpose to take float4x4, or copy a temporary
float4x4 Transpose(float4x4 m) { ... }
// Trying again, doesn't work.
float4x4 resultsPlx = Transpose( GiveMeATemporary() );
Curiously enough, this works:
// A constant reference to an rvalue, a temporary
const float4x4& temporary = GiveMeATemporary();
float4x4 resultsPlx = Transpose(temporary);
Same goes for pointer-based transfers, which is logical as the underlying mechanisms are the same. The relevant part of the C++11 specification is §12.2/5:
The second context is when a reference is bound to a temporary. The
temporary to which the reference is bound or the temporary that is the
complete object to a subobject of which the temporary is bound
persists for the lifetime of the reference except as specified below.
A temporary bound to a reference member in a constructor’s
ctor-initializer (§12.6.2 [class.base.init]) persists until the
constructor exits. A temporary bound to a reference parameter in a
function call (§5.2.2 [expr.call]) persists until the completion of
the full expression containing the call.
This implies it should survive until the calling environment goes out of scope, which is far after the function returns. So, what gives? In all other cases, the variables get "optimized away", with the following exception:
Access violation reading location 0xFFFFFFFFFFFFFFFF
While the solution is obvious, prevent the user from passing temporaries directly with pointer-based transfers like some other libraries, I had hoped to actually make it a little bit more elegant without &s clogging the view.
You can add (non-virtual) member functions to a struct without really affecting the layout. So add destructor to print "I'm here %p" when the structure is destroyed, and print "I'm there" in your function. (Include the this address you you can make sense of other temporary copies being used).
Then you can observe the lifetime in the optimized code. See if that is your problem: I am suspicious that bad lifetime actually means anything because the place it was is still valid address in your stack frame.
Furthermore, changing the bits where a floatnis supposed to live might at worst give you a not-a-number or special value, and the vector processing does not throw or fault in that case, but puts a flag value as the result for that bad element. There is no pointer, so why is it dereferencing −1 ?
I think the misfire is caused by something more interesting.
Run it in the debugger and see what instruction causes that.

Can you pass a matrix by reference in a GLSL shader?

How do you pass by reference in a GLSL shader?
You can mark an attribute as inout in the function signature, and that will make the attribute effectively "pass by reference"
For example,
void doSomething( vec3 trans, inout mat4 mat )
Here mat is "passed by reference", trans is passed by value.
mat must be writeable (ie not a uniform attribute)
All parameters are “pass by value” by default. You can change this behavior using these “parameter qualifiers”:
in: “pass by value”; if the parameter’s value is changed in the function, the actual parameter from the calling statement is unchanged.
out: “pass by reference”; the parameter is not initialized when the function is called; any changes in the parameter’s value changes the actual parameter from the calling statement.
inout: the parameter’s value is initialized by the calling statement and any changes made by the function change the actual parameter from the calling statement.
So if you don't want to make a copy, you should use out