I am trying to generate up my projection and transform matrix functions inside my vertex shader, e.g. defining my transform, rotation, and perspective matrix functions in terms of GLSL. I am doing this in order to increase readability of my program by bypassing all the loading/importing etc. of matrices into the shader, apart from camera position, rotation and FOV.
My only concern is that the matrix is being calculated each shader call or each vertex calculation.
Which, if either of the two, is what actually happens in the shader?
Is it better to deal with the clutter and import the matrix from my program, or is my short-cut of creating the matrix in-shader acceptable/recommended?
*update with code*
#version 400
in vec4 position;
uniform vec3 camPos;
uniform vec3 camRot;
mat4 calcMatrix(
vec3 pos,
vec3 rot,
) {
float foo=1;
float bar=0;
return mat4(pos.x,pos.y,pos.z,0,
rot.x,rot.y,rot.z,0,
foo,bar,foo,bar,
0,0,0,1);
}
void main()
{
gl_Position = calcMatrix(camPos, camRot) * position;
}
versus:
#version 400
in vec4 position;
uniform mat4 viewMatrix;
void main()
{
gl_Position = viewMatrix * position;
}
Which method is recommended?
Whats wrong with doing
float[16] matrix;
calculate_transform(matrix, args);
glUniformMatrix4fv(mvp, 1, false, matrix);
Or even
set_matrix_uniform_using(mvp, args);
which then does what the previous bit of code does.
If you are worried about clutter then extract a function and give it a good name.
To do this in the shader there are several consequences: you would need multiple varaibles to express what the single matrix expresses, leading to clutter at shader load and uniform upload, shader debugging is much more difficult than making sure your own cope does what it needs to do. If you hardcode the movement code you cannot replace it with a free moving camera without changing the shader.
All that doesn't even touch on performance costs. The GPU is much better at loading a matrix from uniform memory and multiplying a it with a vector than it is at doing the trig function needed for the frustum and rotate.
If you need a different matrix for each vertex, well, fairly do it in the shader. I can't imagine a case where that's needed.
Otherwise, it's much more faster to pass the matrix as a uniform. Don't overload the GPU computing again and again the same matrix.
Related
since dFdx must cover every single case,
I think it must be implemented like this pseudo code :
vecN dFdx( vecN )
{
wait_for_other_to_reach_here();
return calculate_difference();
}
but look, if we pass single varying variable, it is very simple. since varying does interpolate linealry.
example fragment shader :
in vec3 v_vertex;
void main()
{
// it must be same result for all fragments in one triangle
vec3 dx = dFdx( v_vertex );
vec2 dy = dFdy( v_vertex );
vec3 normal = normalize( cross( dx , dy ) );
....
....
}
First of all, varyings don't interpolate linearily, unless you use the noperspective or flat interpolation qualifiers. Interpolation will be non-linear in screen space, due to the perspective correction (unless you use an orthogonal projection).
Second, why do you think a wait operation occurs at all? The fragment shaders are run in parallel, and GPUs make sure to always run fragment shaders at least on 2x2 pixel quads, in complete lockstep in the same warp/wavefront/SIMD-group, even if that would include pixels outside of the primitive. This means that the GPU can always calculate the derivates, without ever having to wait for neighbor fragments to catch up. Modern GL will even tell you if a fragment shader invocation is only a helper invocation outside of the primitive via the gl_HelperInvocation input.
In a vertex shader how can a function within the shader be made to access a specific attribute array value after buffering its vertex data to a VBO?
In the shader below the cmp() function is supposed to compare a uniform variable with vertex i.
#version 150 core
in vec2 vertices;
in vec3 color;
out vec3 Color;
uniform mat4 projection;
uniform mat4 view;
uniform mat4 model;
uniform vec2 cmp_vertex; // Vertex to compare
out int isEqual; // Output variable for cmp()
// Comparator
vec2 cmp(){
int i = 3;
return (cmp_vertex == vertices[i]);
}
void main() {
Color = color;
gl_Position = projection * view * model * vec4(vertices, 0.0, 1.0);
isEqual = cmp();
}
Also, can cmp() be modified so that it does the comparison in parallel?
Based on the naming in your shader code, and the wording of your question, it looks like you misunderstood the concept of vertex shaders.
The vertex shader is invoked once for each vertex. So when your vertex shader code executes, it always operates on a single vertex. This means that the name of your in variable is misleading:
in vec2 vertices;
This variable gives you the position of the one and only vertex your shader is working on. So it would probably be clearer if you used a name in singular form:
in vec2 vertex;
Once you realize that you're operating on a single vertex, the rest becomes easy. For the comparison:
bool cmp() {
return (cmp_vertex == vertex);
}
Vertex shaders are typically already invoked in parallel, meaning that many instances can execute at the same time, each one on its own vertex. So there is no need for parallelism within a single shader instance.
You'll probably have more issues achieving what you're after. But I hope that this gets you at least over the initial hurdle.
For example, the following out variable is problematic:
out int isEqual;
out variables of the vertex shader have matching in variables in the fragment shader. By default, the value written by the vertex shader is linearly interpolated across triangles, and the fragment shader gets the interpolated values. This is not supported for variables of type int. They only support flat interpolation:
flat out int isEqual;
But this will probably not give you what you're after, since the value you see in the fragment shader will always be the same across an entire triangle.
The following glsl code appears in my fragment shader. The struct definition causes no problems, but my attempt to use it as the type of a uniform array causes an "invalid operation" error, which is not particularly helpful.
struct InstanceData
{
vec3 rotation;
vec3 scale;
mat4 position;
};
layout (std140) uniform InstanceData instances[100];
How do I structure this code correctly so that it compiles without error, and is thus ready for me to populate with data? Note that I am using core profile version 4.5.
Edit: It seems to be something to do with the use of layout (std140). Removal of that part allows the code to compile, though don't I need that to ensure that the glsl compiler packs the struct data in a predictable way?
Edit: Still not working. My entire vertex shader code looks like this:
#version 450
layout(location=0) in vec4 in_Position;
layout(location=1) in vec4 in_Color;
out vec4 ex_Color;
flat out int ex_Instance;
uniform mat4 modelMatrix;
uniform mat4 viewMatrix;
uniform mat4 projectionMatrix;
// ------ preliminary addition of uniform block to be used soon ------
struct layout (std140) InstanceData
{
vec3 rotation;
vec3 scale;
mat4 position;
};
layout (std140, binding = 0) uniform InstanceData
{
InstanceData instances[100];
};
// -------------------------------------------------------------------
void main(void)
{
gl_Position = (projectionMatrix * viewMatrix * modelMatrix) * in_Position;
ex_Color = in_Color;
}
Note that I haven't written any code externally to populate the uniform buffer yet, nor, as you can see above, have I adjusted my code to make use of the data either. I just want it to compile and work initially as-is (i.e. declared but not used), at which point I will add additional code to start making use of it. There's no point going even that far if the program doesn't like the declaration to begin with.
Edit: Finally figured out the problem by using the shader info log, like so:
GLint infoLogLength;
glGetShaderiv(id, GL_INFO_LOG_LENGTH, &infoLogLength);
GLchar* strInfoLog = new GLchar[infoLogLength + 1];
glGetShaderInfoLog(id, infoLogLength, NULL, strInfoLog);
In a nutshell, as Anton said, I was using layout incorrectly, and my uniform block had the same name as my struct, creating all kinds of confusion for the compiler.
To do this portably, you should use a Uniform Buffer Object.
Presently, your struct uses 3+3+16=22 floating-point components and you are trying to build an array of 100 of these. OpenGL implementations are only required to support 1024 floating-point uniform components in any stage, and your array requires 2200.
Uniform Buffer Objects will allow you to store up to 64 KiB (minimum) of data; well exceeding the limitation above. However, you need to be mindful of data alignment when using UBOs and that is what the layout (std140) qualifier you tried to use is for.
struct InstanceData
{
vec3 rotation;
vec3 scale;
mat4 position;
};
// Uniform block named InstanceBlock, follows std140 alignment rules
layout (std140, binding = 0) uniform InstanceBlock {
InstanceData instances [100];
};
The struct above is not correctly aligned for std140, you will need to be careful when using it.
This is how the data is actually laid out:
struct InstanceData
{
vec3 rotation; // 0,1,2
float padding03; // 3
vec3 scale; // 4,5,6
float padding07; // 7
mat4 position; // 8-23
} // Size: 24 * sizeof (float)
vec3 types are treated the same as vec4 in GLSL and mat4 is effectively an array of 4 vec4, so that means they all need to begin on 4-float boundaries. GLSL automatically inserts padding to satisfy these alignment rules; the changes I made above are to show you the correct way to write this data structure in C. You have to account for 2 floats worth of implicit padding in your structure.
Regarding your edit, you do not have to worry about how GLSL packs a struct until you start using buffer objects.
Without using a Uniform Buffer Object, you have to use functions like glUniform3f (...) to set uniforms. Those functions do not directly expose the data structure to you, so packing does not matter. To set the values for an instance N you would call glUniform3f (...) using the location of "instances [N].rotation" and "instances [N].scale" and glUniformMatrix4fv (...) on "instances [N].position".
That would require 300 API calls to initialize all 100 instances of your struct, so you can see why UBOs are more practical (even ignoring the above mentioned 1024 limitation).
I'm having a bit of a structural problem with a shader of mine. Basically I want to be able to handle multiple lights of potentionally different types, but I'm unsure what the best way of implementing this would be. So far I've been using uniform blocks:
layout (std140) uniform LightSourceBlock
{
int type;
vec3 position;
vec4 color;
// Spotlights / Point Lights
float dist;
// Spotlights
vec3 direction;
float cutoffOuter;
float cutoffInner;
float attenuation;
} LightSources[12];
It works, but there are several problems with this:
A light can be one of 3 types (spotlight, point light, directional light), which require different attributes (Which aren't neccessarily required by all types)
Every light needs a sampler2DShadow (samplerCubeShadow for point lights), which can't be used in uniform blocks.
The way I'm doing it works, but surely there must be a better way of handling something like this? How is this usually done?
Does the order and/or size of shader in/out variables make any difference in memory use or performance? For example, are these:
// vert example:
out vec4 colorRadius;
// tess control example:
out vec4 colorRadius[];
// frag example:
in smooth vec4 colorRadius;
equivalent to these:
// vert example:
out vec3 color;
out float radius;
// tess control example:
out vec3 color[];
out float radius[];
// frag example:
in smooth vec3 color;
in smooth float radius;
Is there any additional cost with the second form or will the compiler pack them together in memory and treat them exactly the same?
The compiler could pack these things together. But it doesn't have to, and there's little evidence that compilers commonly do this. So the top version will at least be no slower than the bottom version.
At the same time, this is more of a micro-optimization. So unless you know that this is a bottleneck, just let it go. It's best to write clear, easily understood code and optimize it when you know where your problems are, than to optimize it not knowing if it's going to be a concern.