Is is possible to have an array of specialization constants such that the glsl code looks similar to the following:
layout(constant_id = 0) const vec2 arr[2] = vec2[] (
vec2(2.0f, 2.0f),
vec2(4.0f, 4.0f)
);
or, alternatively:
layout(constant_id = 0) const float arr[4] = float[] (
2.0f, 2.0f,
4.0f, 4.0f
);
As far as I have read there is no limit to the number of specialization constants that can be used so it feels strange that it wouldn't be possible but when I attempt the above the SPIR-V compiler notifies me that 'constant_id' can only be applied to a scalar. Currently I am using a uniform buffer to provide the data but I would like to eliminate the backed buffer and the need to bind the buffer before drawing as well as allow the system to optimize the code during pipeline creation if its possible.
The shading languages (both Vulkan-GLSL and SPIR-V) makes something of a distinction between the definition of a specialization constant within the shader and the interface for specializing those constants. But they go about this process in different ways.
In both languages, the external interface to a specialization constant only works on scalar values. That is, though you can set multiple constants to values, the constants you're setting are each a single scalar.
SPIR-V allows you to declare a specialization constant which is a composite (array/vector/matrix). However, the components of this composite must be either specialization constants or constant values. If those components are scalar specialization constants, you can OpDecorate them with an ID, which the external code will access.
Vulkan (and OpenGL) GLSL go about this slightly differently from raw SPIR-V. In GLSL, a const-qualified value with a constant_id is a specialization constant. These must be scalars.
However, you can also have a const-qualified value that is initialized by values that are either constant expressions or specialization constants. You don't qualify these with a constant_id, but you built them from things that are so qualified:
layout(constant_id = 18) const int scX = 1;
layout(constant_id = 19) const int scZ = 1;
const vec3 scVec = vec3(scX, 1, scZ); // partially specialized vector
const-qualified values that are initialized from specialization constants are called "partially specialized". When this GLSL is converted into SPIR-V, these are converted into OpSpecConstantComposite values.
Related
I wonder if I may have something like this:
layout (location = attr.POSITION) in vec3 position;
Where for example Attr is a constant structure
const struct Attr
{
int POSITION;
} attr = Attr(0);
I already tried, but it complains
Shader status invalid: 0(34) : error C0000: syntax error, unexpected
integer constant, expecting identifier or template identifier or type
identifier at token ""
Or if there is no way with structs, may I use something else to declare a literal input qualifier such as attr.POSITION?
GLSL has no such thing as a const struct declaration. It does however have compile time constant values:
const int position_loc = 0;
The rules for constant expressions say that a const-qualified variable which is initialized with a constant expression is itself a constant expression.
And there ain't no rule that says that the type of such a const-qualified variable must be a basic type:
struct Attr
{
int position;
};
const Attr attr = {1};
Since attr is initialized with an initialization list containing constant expressions, attr is itself a constant expression. Which means that attr.position is an constant expression too, one of integral type.
And such a compile-time integral constant expression can be used in layout qualifiers, but only if you're using GLSL 4.40 or ARB_ehanced_layouts:
layout(location = attr.position) in vec3 position;
Before that version, you'd be required to use an actual literal. Which means the best you could do would be a #define:
#define position_loc 1
layout(location = position_loc) in vec3 position;
Now personally, I would never rely on such integral-constant-expression-within-struct gymnastics. Few people rely on them, so driver code rarely gets tested in this fashion. So the likelihood of encountering a driver bug is fairly large. The #define method is far more likely to work in practice.
What variable types are compatible with opengl's glGetFloat() or glGetFloatv()?
P.S. This is in c++.
The basic type you want to use is GLfloat. This matches the type in the function prototype. This is a 32-bit float value, which mostly matches the float type, but this is not guaranteed.
For cases where glGetFloatv() returns a single value, you can simply use the address of a GLfloat variable. For example:
GLfloat val;
glGetFloatv(GL_DEPTH_CLEAR_VALUE, &val);
For cases that return multiple values, you can either use an array:
GLfloat vals[4];
glGetFloatv(GL_COLOR_CLEAR_VALUE, vals);
Or, to make it more C++, a vector:
std::vector<GLfloat> vals(4);
glGetFloatv(GL_COLOR_CLEAR_VALUE, &vals[0]);
Or, even nicer in C++11:
std::vector<GLfloat> vals(4);
glGetFloatv(GL_COLOR_CLEAR_VALUE, vals.data());
I'm trying to understand how passing parameters is implemented in shader languages.
I've read several articles and documentation, but still I have some doubts. In particular I'm trying to understand the differences with a C++ function call, with a particular emphasis on performances.
There are slightly differences between HLSL,Cg and GLSL but I guess the underline implementation is quite similar.
What I've understood so far:
Unless otherwise specified a function parameter is always passed by value (is this true even for matrix?)
Passing by value in this context hasn't the same implications as with C++. No recursion is supported, so the stack isn't used and most function are inlined and arguments directly put into registers.
functions are often inlined by default (HLSL) or at least inline keyword is always respected by the compiler (Cg)
Are the considerations above right?
Now 2 specific question:
Passing a matrix as function parameter
inline float4 DoSomething(in Mat4x4 mat, in float3 vec)
{
...
}
Considering the function above, in C++ that would be nasty and would be definitely better to use references : const Mat4x4&.
What about shaders? Is this a bad approach? I read that for example inout qualifier could be used to pass a matrix by reference, but actually it implicates that matrix be modified by the called function..
Does the number (and type of arguments) have any implication? For example is better use functions with a limited set of arguments?Or avoid passing matrices?
Is inout modifier a valid way to improve performance here? If so, anyone does know how a typical compiler implement this?
Are there any difference between HLSL an GLSL on this?
Does anyone have hints on this?
According to the spec, values are always copied. For in parameters, the are copied at call time, for out parameters at return time, and for inout parameters at both call and return time.
In the language of the spec (GLSL 4.50, section 6.1.1 "Function Calling Conventions"):
All arguments are evaluated at call time, exactly once, in order, from left to right. Evaluation of an in parameter results in a value that is copied to the formal parameter. Evaluation of an out parameter results in an l-value that is used to copy out a value when the function returns. Evaluation of an inout parameter results in both a value and an l-value; the value is copied to the formal parameter at call time and the lvalue is used to copy out a value when the function returns.
An implementation is of course free to optimize anything it wants as long as the result is the same as it would be with the documented behavior. But I don't think you can expect it to work in any specify way.
For example, it wouldn't be save to pass all inout parameters by reference. Say if you had this code:
vec4 Foo(inout mat4 mat1, inout mat4 mat2) {
mat1 = mat4(0.0);
mat2 = mat4(1.0);
return mat1 * vec4(1.0);
}
mat4 myMat;
vec4 res = Foo(myMat, myMat);
The correct result for this is a vector containing all 0.0 components.
If the arguments were passed by reference, mat1 and mat2 inside Foo() would alias the same matrix. This means that the assignment to mat2 also changes the value of mat1, and the result is a vector with all 1.0 components. Which would be wrong.
This is of course a very artificial example, but the optimization has to be selective to work correctly in all cases.
Your first bullet point does not work when you consider arguments qualified using inout.
The real issue is what you do with the parameter inside the function, if you modify a parameter qualified with in then it cannot be "passed by reference" and a copy will have to be made. On modern hardware this probably is not a big deal, but Shader Model 2.0 was pretty limited in terms of number of temp registers and I ran into these kinds of issues more than once when GLSL and Cg first came out.
For reference, consider the following GLSL code:
vec4 DoSomething (mat4 mat, vec3 vec)
{
// Pretty straight forward, no temporary registers are required to pass arguments.
return vec4 (mat [0] + vec4 (vec, 0.0));
}
vec4 DoSomethingCopy (mat4 mat, vec3 vec)
{
mat [0][0] = 0.0; // This requires the compiler to make a local copy of mat
return vec4 (mat [0] + vec4 (vec, 0.0));
}
vec4 DoSomethingInOut (inout mat4 mat, in vec3 vec)
{
mat [0][0] = 0.0; // No copy required, but the original mat is modified
return vec4 (mat [0] + vec4 (vec, 0.0));
}
I cannot really comment on performance, my only bad experiences had to do with hitting actual hardware limits on older GPUs. Of course you should assume that any time something has to be copied it is going to negatively impact performance.
All shader functions are inlined (recursive function are forbidden). The concept of reference/pointer is invalid here too. The only case when some code will be generated is when you write on an input parameter. However, if the original registers aren't used anymore the compiler will probably use the same registers, and the copy (mov operation) won't be needed.
Bottom line: function invocation is free.
As I understand, calculation at compile time means, that at runtime instead of constexpr functions there will be const values (by definition, because they will be already calculated).
That touches functions (they already calculated, so, it is just as variable of function type), that touches variables (it is just as static const variable), same with classes.
One plus from constexpr function I see: if in ANSI C, for example, I had to have 5 defines, maybe united logically, now I can write one such function and use it instead of 5 defines, being able to write logic to manipulate of set of constexpr function return values. So, as result I have same set of 5 values, but now I described them logically, writing such function.
But I feel I understand something wrong, because I see such examples:
class vec3 {
union {
struct {
float _x, _y, _z;
};
float _v[3];
};
public:
constexpr vec3(): _x(0), _y(0), _z(0) {} //
constexpr vec3(float x, float y, float z): _x(x), _y(y), _z(z) {}(1)
constexpr vec3(const vec3&) = default; // (2)
vec3 &operator=(const vec3&) = default; // (3)
constexpr float x() { return _x; } // (4)
constexpr float y() { return _y; }
constexpr float z() { return _z; }
constexpr const float *v() { return _v; } // (5)
float *v() { return _v; }
};
(1) I can understand if constructor without parameters is constexpr. Yes, it is some const state that can be calculated at compile time and use in future, maybe faster. But why constructor with parameters? Object body will be on Stack or on Heap, we don't know which parameters we will put there, what for is it?
(2, 3) Same here, what for is it? How can we calculate at compile time copying of unknown object?
(4, 5) I wonder what for it can be used. Just for object which state was calculated at compile time? Or calling value of some object (that will be created at runtime) from Heap or Stack may costs much and that somehow will speed it on?
You are missing the fact that the use of constexpr in member functions and constructors does not mean that they are always executed in compile-time. They are simply done so when possible.
A few examples for clarity:
constexpr vec3 myGlobalVec{1,2,3}; // uses vec3(x,y,z) in compile time
constexpr vec3 copyOfGlobalVec = myGlobalVec; // uses copy constructor in compile time
void foo(int n) {
vec3 my(n,n,n); // uses the same constructor, but invoked in runtime
// ...
}
The same applies to the remaining constructors: the key point is that with constexpr-qualified functions and constructors, if the full arguments list contains constant expressions (and if the object at hand is a constant expression), then the result will also be a constant expression.
The advantage of having constexpr getters is that attributes of constant expressions can be obtained for use in cases where only constant expressions are allowed. A float cannot be used as a template parameter, but assuming the vector was made of ints, you could do this:
SomeClass<myGlobalVec.x()> anObject;
1: A constexpr constructor that takes parameters can be called at compile time if the parameters are known at compile time. This may be useful to encapsulate state, the same reason we use objects instead of primitive values at run time.
2, 3: Similarly, one can copy a constexpr object at compile time if the original source object is also a compile-time value.
4, 5: constexpr accessors can be called both at run time and at compile time. Run time performance is not directly relevant, except that constexpr methods are implicitly inline and otherwise well-behaved (not containing some undefined behavior) so may be more amenable to optimization.
Object body will be on Stack or on Heap, (...)
This is at best irrelevant, and at worst wrong. In the C++ abstract machine the objects will be in some storage location. In a real machine, the objects will be wherever the compiler deems appropriate, and that wherever may even be nowhere.
we don't know which parameters we will put there, what for is it?
If we initialise an object using that constructor with constant expressions for arguments, the initialisation expression will also be a constant expression. Constant expressions are special because they are the only kinds of expressions that can be used in certain particular contexts, like array sizes, or template arguments. Other than counting as a constant expression there is no difference between this and a normal initialisation without constexpr.
How can we calculate at compile time copying of unknown object?
The same as before: if we initialise a copy using a constant expression as the source, the copy will also be a constant expression. Note that constexpr is only about things counting as constant expressions or not. It isn't about when they are evaluated.
I wonder what for it can be used. Just for object which state was calculated at compile time?
See above :) The primary purpose is to allow use of those objects in contexts that require constant expressions.
Suppose I have the following simple struct:
struct Vector3
{
double x;
double y;
double z;
};
and I create a list of vertices:
std::vector<Vector3> verticesList;
In addition to this I need to use a third-party library. The library has a function with the following signature:
typedef double[3] Real3;
external void createMesh(const Real3* vertices, const size_t verticesCount);
What is the best way to convert verticesList into something which could be passed into createMesh() as the vertices parameter?
At the moment I use the following approach:
static const size_t MAX_VERTICES = 1024;
if (verticesList.size() > MAX_VERTICES)
throw std::exception("Number of vertices is too big");
Real3 rawVertices[MAX_VERTICES];
for (size_t vertexInd = 0; vertexInd < verticesList.size(); ++vertexInd)
{
const Vector3& vertex = verticesList[vertexInd];
rawVertices[vertexInd][0] = vertex.x;
rawVertices[vertexInd][1] = vertex.y;
rawVertices[vertexInd][2] = vertex.z;
}
createMesh(rawVertices, verticesList.size());
But surely it is not the best way to solve the issue.
That is one proper way of doing it. There are also some other ways...
The type Vector3 is layout compatible with the type Real3, the implication of this is that you can force casting a pointer to one type to a pointer of the other:
createMesh( reinterpret_cast<Real3*>(&verticesList[0]), vertices.size() );
Other alternative, as Rook mentions, to remove the loop is using memcpy, since the types are POD:
Real3 rawVertices[MAX_VERTICES];
std::memcpy( rawVertices, &verticesList[0],
vertices.size()*sizeof verticesList[0] );
This is more concise, and probably more efficient, but it still is copying the whole container.
I believe that the standard does guarantee this behavior (at least C++11), two standard layout and standard compatible types have the same memory layout (duh?), and ยง9.2p19 states:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
This guarantee technically means something slightly different than what I claimed before: you can reinterpret_cast<double*>(&verticesList[0]) points to verticesList[0].x. But it also implies that the conversion from double* to Real3 pointer through reinterpret cast will also be fine.