will dFdx optimize for single varying? - opengl

since dFdx must cover every single case,
I think it must be implemented like this pseudo code :
vecN dFdx( vecN )
{
wait_for_other_to_reach_here();
return calculate_difference();
}
but look, if we pass single varying variable, it is very simple. since varying does interpolate linealry.
example fragment shader :
in vec3 v_vertex;
void main()
{
// it must be same result for all fragments in one triangle
vec3 dx = dFdx( v_vertex );
vec2 dy = dFdy( v_vertex );
vec3 normal = normalize( cross( dx , dy ) );
....
....
}

First of all, varyings don't interpolate linearily, unless you use the noperspective or flat interpolation qualifiers. Interpolation will be non-linear in screen space, due to the perspective correction (unless you use an orthogonal projection).
Second, why do you think a wait operation occurs at all? The fragment shaders are run in parallel, and GPUs make sure to always run fragment shaders at least on 2x2 pixel quads, in complete lockstep in the same warp/wavefront/SIMD-group, even if that would include pixels outside of the primitive. This means that the GPU can always calculate the derivates, without ever having to wait for neighbor fragments to catch up. Modern GL will even tell you if a fragment shader invocation is only a helper invocation outside of the primitive via the gl_HelperInvocation input.

Related

How to generate geometry to link neighbour nodes in a geometry shader with OpenCL/GL interop?

I am working on a 3D mesh I am storing in an array: each element of the array is a 4D point in homogeneous coordinates (x, y, z, w). I use OpenCL to do some calculations on these data, which later I want to visualise, therefore I set up an OpenCL/GL interop context. I have created a shared buffer between OpenCL and OpenGL by using the clCreateFromGLBuffer function on a GL_ARRAY_BUFFER:
...
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, size_of_data, data, GL_DYNAMIC_DRAW);
vbo_buff = clCreateFromGLBuffer(ctx, CL_MEM_READ_WRITE, vbo, &err);
...
In the vertex shader, I access data this way:
layout (location = 0) in vec4 data;
out VS_OUT
{
vec4 vertex;
} vs_out;
void main(void)
{
vs_out.vertex = data;
}
Then in the geometry shader I do something like this:
layout (points) in;
layout (triangle_strip, max_vertices = MAX_VERT) out;
in VS_OUT
{
vec4 vertex;
} gs_in[];
void main()
{
gl_Position = gs_in[0].vertex;
EmitVertex();
...etc...
}
This gives me the ability of generating geometry based on the position of each point the stored in the data array.
This way, the geometry I can generate is only based on the current point being processed by the geometry shader: e.g. I am able to construct a small cube (voxel) around each point.
Now I would like to be able to access to the position of other points in the data array within the geometry shader: e.g. I would like to be able to retrieve the coordinates of another point (indexed by another shared buffer of an arbitrary length) besides the one which is currently processed in order to draw a line connecting them.
The problem I have is that in the geometry shader gs_in[0].vertex gives me the position of each point but I don't know which one for at the time (which index?). Moreover I don't know how to access the position of other points besides that one at the same time.
In an hypothetical pseudo-code I would like to be able to do something like this:
point_A = gs_in[0].vertex[index_A];
point_B = gs_in[0].vertex[index_B];
draw_line_between_A_and_B(point_A, point_B);
It is not clear to me whether this is possible or not, or how to achieve this within a geometry shader. I would like to stick to this approach because the calculations I do in the OpenCL kernels implement a cellular automata, hence it is convenient for me to organise my code (neutrino) in terms of central nodes and related neighbours.
All suggestions are welcome.
Thanks.
but I don't know which one for at the time (which index?)
See gl_PrimitiveIDIn
I don't know how to access the position of other points besides that one at the same time.
You can bind same source buffer two times, as a vertex source and as GL_TEXTURE_BUFFER. If your OpenGL implementation supports it, you'll then be able to read from there.
Unlike Direct3D, in GL the support for the feature is optional, the spec says GL_MAX_GEOMETRY_SHADER_STORAGE_BLOCKS can be zero.
"This gives me the ability of generating geometry based on the position of each point the stored in the data array."
No it does not. The input to the geometry shader are not all the vertex attributes in the buffer. Let me quote the Geometry Shader wiki page:
Geometry shaders take a primitive as input; each primitive is composed of some number of vertices, as defined by the input primitive type in the shader.
Primitives are a single point, line primitive or triangle primitive. For instance, If the primitive type is GL_POINTS, then the size of the input array is 1 and you can only access the vertex coordinate of the point, or if the primitive type is GL_TRIANGLES, the the size of the input array is 3 and you can only access the 3 vertex attributes (respectively corners) which form the triangle.
If you want to access more data, the you have to use a Shader Storage Buffer Object (or a texture).

OpenGL In-shader Vertex Matrix Creation

I am trying to generate up my projection and transform matrix functions inside my vertex shader, e.g. defining my transform, rotation, and perspective matrix functions in terms of GLSL. I am doing this in order to increase readability of my program by bypassing all the loading/importing etc. of matrices into the shader, apart from camera position, rotation and FOV.
My only concern is that the matrix is being calculated each shader call or each vertex calculation.
Which, if either of the two, is what actually happens in the shader?
Is it better to deal with the clutter and import the matrix from my program, or is my short-cut of creating the matrix in-shader acceptable/recommended?
*update with code*
#version 400
in vec4 position;
uniform vec3 camPos;
uniform vec3 camRot;
mat4 calcMatrix(
vec3 pos,
vec3 rot,
) {
float foo=1;
float bar=0;
return mat4(pos.x,pos.y,pos.z,0,
rot.x,rot.y,rot.z,0,
foo,bar,foo,bar,
0,0,0,1);
}
void main()
{
gl_Position = calcMatrix(camPos, camRot) * position;
}
versus:
#version 400
in vec4 position;
uniform mat4 viewMatrix;
void main()
{
gl_Position = viewMatrix * position;
}
Which method is recommended?
Whats wrong with doing
float[16] matrix;
calculate_transform(matrix, args);
glUniformMatrix4fv(mvp, 1, false, matrix);
Or even
set_matrix_uniform_using(mvp, args);
which then does what the previous bit of code does.
If you are worried about clutter then extract a function and give it a good name.
To do this in the shader there are several consequences: you would need multiple varaibles to express what the single matrix expresses, leading to clutter at shader load and uniform upload, shader debugging is much more difficult than making sure your own cope does what it needs to do. If you hardcode the movement code you cannot replace it with a free moving camera without changing the shader.
All that doesn't even touch on performance costs. The GPU is much better at loading a matrix from uniform memory and multiplying a it with a vector than it is at doing the trig function needed for the frustum and rotate.
If you need a different matrix for each vertex, well, fairly do it in the shader. I can't imagine a case where that's needed.
Otherwise, it's much more faster to pass the matrix as a uniform. Don't overload the GPU computing again and again the same matrix.

Opengl - instanced attributes

I use oglplus - it's a c++ wrapper for OpenGL.
I have a problem with defining instanced data for my particle renderer - positions work fine but something goes wrong when I want to instance a bunch of ints from the same VBO.
I am going to skip some of the implementation details to not make this problem more complicated. Assume that I bind VAO and VBO before described operations.
I have an array of structs (called "Particle") that I upload like this:
glBufferData(GL_ARRAY_BUFFER, sizeof(Particle) * numInstances, newData, GL_DYNAMIC_DRAW);
Definition of the struct:
struct Particle
{
float3 position;
//some more attributes, 9 floats in total
//(...)
int fluidID;
};
I use a helper function to define the OpenGL attributes like this:
void addInstancedAttrib(const InstancedAttribDescriptor& attribDesc, GLSLProgram& program, int offset=0)
{
//binding and some implementation details
//(...)
oglplus::VertexArrayAttrib attrib(program, attribDesc.getName().c_str());
attrib.Pointer(attribDesc.getPerVertVals(), attribDesc.getType(), false, sizeof(Particle), (void*)offset);
attrib.Divisor(1);
attrib.Enable();
}
I add attributes for positions and fluidids like this:
InstancedAttribDescriptor posDesc(3, "InstanceTranslation", oglplus::DataType::Float);
this->instancedData.addInstancedAttrib(posDesc, this->program);
InstancedAttribDescriptor fluidDesc(1, "FluidID", oglplus::DataType::Int);
this->instancedData.addInstancedAttrib(fluidDesc, this->program, (int)offsetof(Particle,fluidID));
Vertex shader code:
uniform vec3 FluidColors[2];
in vec3 InstanceTranslation;
in vec3 VertexPosition;
in vec3 n;
in int FluidID;
out float lightIntensity;
out vec3 sphereColor;
void main()
{
//some typical MVP transformations
//(...)
sphereColor = FluidColors[FluidID];
gl_Position = projection * vertexPosEye;
}
This code as whole produces this output:
As you can see, the particles are arranged in the way I wanted them to be, which means that "InstanceTranslation" property is setup correctly. The group of the particles to the left have FluidID value of 0 and the ones to the right equal to 1. The second set of particles have proper positions but index improperly into FluidColors array.
What I know:
It's not a problem with the way I set up the FluidColors uniform. If I hard-code the color selection in the shader like this:
sphereColor = FluidID == 0? FluidColors[0] : FluidColors1;
I get:
OpenGL returns GL_NO_ERROR from glGetError so there's no problem with the enums/values I provide
It's not a problem with the offsetof macro. I tried using hard-coded values and they didn't work either.
It's not a compatibility issue with GLint, I use simple 32bit Ints (checked this with sizeof(int))
I need to use FluidID as a instanced attrib that indexes into the color array because otherwise, if I were to set the color for a particle group as a simple vec3 uniform, I'd have to batch the same particle types (with the same FluidID) together first which means sorting them and it'd be too costly of an operation.
To me, this seems to be an issue of how you set up the fluidID attribute pointer. Since you use the type int in the shader, you must use glVertexAttribIPointer() to set up the attribute pointer. Attributes you set up with the normal glVertexAttribPointer() function work only for float-based attribute types. They accept integer input, but the data will be converted to float when the shader accesses them.
In oglplus, you apparently have to use VertexArrayAttrib::IPointer() instead of VertexArrayAttrib::Pointer() if you want to work with integer attributes.

What happens if Vertex Attributes not match Vertex Shader Input

As I know, if the vertex buffer has an attribute that shader does not use, there will be no problem.
What happens if the vertex buffer does not have an attribute that the vertex shader uses for OpenGL?
I know for DirectX11, nothing will draw if the attribute needed in shader is not provided in vertex buffer.
Example
vb only has: position
vertex shader:
attribute vec3 position;
attribute vec4 color;
varying vec4 out_color;
void main()
{
gl_Position = vec4(position, 1.0);
out_color = color;
}
pixel shader:
varying vec4 out_color;
void main()
{
gl_FragColor = vertex_color;
}
What is the pixel color after the shaders executed?
There are two scenarios:
If the attribute array is enabled (i.e. glEnableVertexAttribArray() was called for the attribute), but you didn't make a glVertexAttribPointer() call that specifies the data to be in a valid VBO, bad things can happen. I believe it can be implementation dependent what exactly the outcome is. For example, the draw call could crash, or there could be garbage rendering. The best thing I can find in the spec, which still sounds somewhat vague to me, is:
Most, but not all GL commands operating on buffer objects will detect attempts to read from or write to a location in a bound buffer object at an offset less than zero, or greater than or equal to the buffer’s size. When such an attempt is detected, a GL error will be generated. Any command which does not detect these attempts, and performs such an invalid read or write, has undefined results, and may result in GL interruption or termination.
If the attribute array is not enabled, the current attribute value is used for all vertices. This is the value set with glVertexAttrib4fv() and similar calls. If no such call was made, the default for the current attribute value is (0.0, 0.0, 0.0, 1.0).

glsl shader in/out variable packing

Does the order and/or size of shader in/out variables make any difference in memory use or performance? For example, are these:
// vert example:
out vec4 colorRadius;
// tess control example:
out vec4 colorRadius[];
// frag example:
in smooth vec4 colorRadius;
equivalent to these:
// vert example:
out vec3 color;
out float radius;
// tess control example:
out vec3 color[];
out float radius[];
// frag example:
in smooth vec3 color;
in smooth float radius;
Is there any additional cost with the second form or will the compiler pack them together in memory and treat them exactly the same?
The compiler could pack these things together. But it doesn't have to, and there's little evidence that compilers commonly do this. So the top version will at least be no slower than the bottom version.
At the same time, this is more of a micro-optimization. So unless you know that this is a bottleneck, just let it go. It's best to write clear, easily understood code and optimize it when you know where your problems are, than to optimize it not knowing if it's going to be a concern.