Shader School answers, branching lesson (box) - glsl

I'm finding it a bit tricky to reach the 'right' answers on shader school to learn GLSL.
One of them is lesson 4 - branching as the title, box as the GLSL filename. It mentions using lessthan()and similar component-wise comparison operations for vectors but without any working examples. I'm not finding any just using google either, so if anyone has working examples I'd appreciate it.
The task is to check a vec2 point p is within 2 vec2 high and low points. This is what I have so far:
bool inBox(highp vec2 lo, highp vec2 hi, highp vec2 p) {
//Test if the point p is inside the box bounded by [lo, hi]
return all(bvec2 (lessthan(p, hi), greaterthan(p, lo)));
}
#pragma glslify: export(inBox)
But the only solution that seems to work is:
bool inBox(highp vec2 lo, highp vec2 hi, highp vec2 p) {
//Test if the point p is inside the box bounded by [lo, hi]
return all(bvec2( all(lessThan(p, hi)), all(greaterThan(p, lo)) ));
}
#pragma glslify: export(inBox)
And I don't understand why, since I don't see why lessthan() or greaterThan() need to be wrapped in additional all()

Related

Stuck trying to optimize complex GLSL fragment shader

So first off, let me say that while the code works perfectly well from a visual point of view, it runs into very steep performance issues that get progressively worse as you add more lights. In its current form it's good as a proof of concept, or a tech demo, but is otherwise unusable.
Long story short, I'm writing a RimWorld-style game with real-time top-down 2D lighting. The way I implemented rendering is with a 3 layered technique as follows:
First I render occlusions to a single-channel R8 occlusion texture mapped to a framebuffer. This part is lightning fast and doesn't slow down with more lights, so it's not part of the problem:
Then I invoke my lighting shader by drawing a huge rectangle over my lightmap texture mapped to another framebuffer. The light data is stored in an array in an UBO and it uses the occlusion mapping in its calculations. This is where the slowdown happens:
And lastly, the lightmap texture is multiplied and added to the regular world renderer, this also isn't affected by the number of lights, so it's not part of the problem:
The problem is thus in the lightmap shader. The first iteration had many branches which froze my graphics driver right away when I first tried it, but after removing most of them I get a solid 144 fps at 1440p with 3 lights, and ~58 fps at 1440p with 20 lights. An improvement, but it scales very poorly. The shader code is as follows, with additional annotations:
#version 460 core
// per-light data
struct Light
{
vec4 location;
vec4 rangeAndstartColor;
};
const int MaxLightsCount = 16; // I've also tried 8 and 32, there was no real difference
layout(std140) uniform ubo_lights
{
Light lights[MaxLightsCount];
};
uniform sampler2D occlusionSampler; // the occlusion texture sampler
in vec2 fs_tex0; // the uv position in the large rectangle
in vec2 fs_window_size; // the window size to transform world coords to view coords and back
out vec4 color;
void main()
{
vec3 resultColor = vec3(0.0);
const vec2 size = fs_window_size;
const vec2 pos = (size - vec2(1.0)) * fs_tex0;
// process every light individually and add the resulting colors together
// this should be branchless, is there any way to check?
for(int idx = 0; idx < MaxLightsCount; ++idx)
{
const float range = lights[idx].rangeAndstartColor.x;
const vec2 lightPosition = lights[idx].location.xy;
const float dist = length(lightPosition - pos); // distance from current fragment to current light
// early abort, the next part is expensive
// this branch HAS to be important, right? otherwise it will check crazy long lines against occlusions
if(dist > range)
continue;
const vec3 startColor = lights[idx].rangeAndstartColor.yzw;
// walk between pos and lightPosition to find occlusions
// standard line DDA algorithm
vec2 tempPos = pos;
int lineSteps = int(ceil(abs(lightPosition.x - pos.x) > abs(lightPosition.y - pos.y) ? abs(lightPosition.x - pos.x) : abs(lightPosition.y - pos.y)));
const vec2 lineInc = (lightPosition - pos) / lineSteps;
// can I get rid of this loop somehow? I need to check each position between
// my fragment and the light position for occlusions, and this is the best I
// came up with
float lightStrength = 1.0;
while(lineSteps --> 0)
{
const vec2 nextPos = tempPos + lineInc;
const vec2 occlusionSamplerUV = tempPos / size;
lightStrength *= 1.0 - texture(occlusionSampler, vec2(occlusionSamplerUV.x, 1 - occlusionSamplerUV.y)).x;
tempPos = nextPos;
}
// the contribution of this light to the fragment color is based on
// its square distance from the light, and the occlusions between them
// implemented as multiplications
const float strength = max(0, range - dist) / range * lightStrength;
resultColor += startColor * strength * strength;
}
color = vec4(resultColor, 1.0);
}
I call this shader as many times as I need, since the results are additive. It works with large batches of lights or one by one. Performance-wise, I didn't notice any real change trying different batch numbers, which is perhaps a bit odd.
So my question is, is there a better way to look up for any (boolean) occlusions between my fragment position and light position in the occlusion texture, without iterating through every pixel by hand? Could render buffers perhaps help here (from what I've read they're for reading data back to system memory, I need it in another shader though)?
And perhaps, is there a better algorithm for what I'm doing here?
I can think of a couple routes for optimization:
Exact: apply a distance transform on the occlusion map: this will give you the distance to the nearest occluder at each pixel. After that you can safely step by that distance within the loop, instead of doing baby steps. This will drastically reduce the number of steps in open regions.
There is a very simple CPU-side algorithm to compute a DT, and it may suit you if your occluders are static. If your scene changes every frame, however, you'll need to search the literature for GPU side algorithms, which seem to be more complicated.
Inexact: resort to soft shadows -- it might be a compromise you are willing to make, and even seen as an artistic choice. If you are OK with that, you can create a mipmap from your occlusion map, and then progressively increase the step and sample lower levels as you go farther from the point you are shading.
You can go further and build an emitters map (into the same 4-channel map as the occlusion). Then your entire shading pass will be independent of the number of lights. This is an equivalent of voxel cone tracing GI applied to 2D.

when do i need GL_EXT_nonuniform_qualifier?

I want to compile the following code into SPIR-V
#version 450 core
#define BATCH_ID (PushConstants.Indices.x >> 16)
#define MATERIAL_ID (PushConstants.Indices.x & 0xFFFF)
layout (push_constant) uniform constants {
ivec2 Indices;
} PushConstants;
layout (constant_id = 1) const int MATERIAL_SIZE = 32;
in Vertex_Fragment {
layout(location = 0) vec4 VertexColor;
layout(location = 1) vec2 TexCoord;
} inData;
struct ParameterFrequence_3 {
int ColorMap;
};
layout (set = 3, binding = 0, std140) uniform ParameterFrequence_3 {
ParameterFrequence_3[MATERIAL_SIZE] data;
} Frequence_3;
layout (location = 0) out vec4 out_Color;
layout (set = 2, binding = 0) uniform sampler2D[] Sampler2DResources;
void main(void) {
vec4 color = vec4(1.0);
color *= texture(Sampler2DResources[Frequence_3.data[MATERIAL_ID].ColorMap], inData.TexCoord);
color *= inData.VertexColor;
out_Color = color;
}
(The code is generated by a program I am developing which is why the code might look a little strange, but it should make the problem clear)
When trying to do so, I am told
error: 'variable index' : required extension not requested: GL_EXT_nonuniform_qualifier
(for the third last line where the texture lookup also happens)
After I followed a lot of discussion around how dynamically uniform is specified and that the shading language spec basically says the scope is specified by the API while neither OpenGL nor Vulkan really do so (maybe that changed), I am confused why i get that error.
Initially I wanted to use instanced vertex attributes for the indices, those however are not dynamically uniform which is what I thought the PushConstants would be.
So when PushConstants are constant during the draw call (which is the max scope for dynamically uniform requirement), how can the above shader end up in any dynamically non-uniform state?
Edit: Does it have to do with the fact that the buffer backing the storage for the "ColorMap" could be aliased by another buffer via which the content might be modified during the invocation? Or is there a way to tell the compiler this is a "restricted" storage so it knows it is constant?
Thanks
It is 3 am in the morning over here, I should just go to sleep.
Chances anyone end up having the same problem are small, but I'd still rather answer it myself than delete it:
I simply had to add a SpecializationConstant to set the size of the sampler2D array, now it works without requiring any extension.
Good night

Calculate surface normals from depth image using neighboring pixels cross product

As the title says I want to calculate the surface normals of a given depth image by using the cross product of neighboring pixels. I would like to use Opencv for that and avoid using PCL however, I do not really understand the procedure, since my knowledge is quite limited in the subject. Therefore, I would be grateful is someone could provide some hints. To mention here that I do not have any other information except the depth image and the corresponding rgb image, so no K camera matrix information.
Thus, lets say that we have the following depth image:
and I want to find the normal vector at a corresponding point with a corresponding depth value like in the following image:
How can I do that using the cross product of the neighbouring pixels? I do not mind if the normals are not highly accurate.
Thanks.
Update:
Ok, I was trying to follow #timday's answer and port his code to Opencv. With the following code:
Mat depth = <my_depth_image> of type CV_32FC1
Mat normals(depth.size(), CV_32FC3);
for(int x = 0; x < depth.rows; ++x)
{
for(int y = 0; y < depth.cols; ++y)
{
float dzdx = (depth.at<float>(x+1, y) - depth.at<float>(x-1, y)) / 2.0;
float dzdy = (depth.at<float>(x, y+1) - depth.at<float>(x, y-1)) / 2.0;
Vec3f d(-dzdx, -dzdy, 1.0f);
Vec3f n = normalize(d);
normals.at<Vec3f>(x, y) = n;
}
}
imshow("depth", depth / 255);
imshow("normals", normals);
I am getting the correct following result (I had to replace double with float and Vecd to Vecf, I do not know why that would make any difference though):
You don't really need to use the cross product for this, but see below.
Consider your range image is a function z(x,y).
The normal to the surface is in the direction (-dz/dx,-dz/dy,1). (Where by dz/dx I mean the differential: the rate of change of z with x). And then normals are conventionally normalized to unit length.
Incidentally, if you're wondering where that (-dz/dx,-dz/dy,1) comes from... if you take the 2 orthogonal tangent vectors in the plane parellel to the x and y axes, those are (1,0,dzdx) and (0,1,dzdy). The normal is perpendicular to the tangents, so should be (1,0,dzdx)X(0,1,dzdy) - where 'X' is cross-product - which is (-dzdx,-dzdy,1). So there's your cross product derived normal, but there's little need to compute it so explicitly in code when you can just use the resulting expression for the normal directly.
Pseudocode to compute a unit-length normal at (x,y) would be something like
dzdx=(z(x+1,y)-z(x-1,y))/2.0;
dzdy=(z(x,y+1)-z(x,y-1))/2.0;
direction=(-dzdx,-dzdy,1.0)
magnitude=sqrt(direction.x**2 + direction.y**2 + direction.z**2)
normal=direction/magnitude
Depending on what you're trying to do, it might make more sense to replace the NaN values with just some large number.
Using that approach, from your range image, I can get this:
(I'm then using the normal directions calculated to do some simple shading; note the "steppy" appearance due to the range image's quantization; ideally you'd have higher precision than 8-bit for the real range data).
Sorry, not OpenCV or C++ code, but just for completeness: the complete code which produced that image (GLSL embedded in a Qt QML file; can be run with Qt5's qmlscene) is below. The pseudocode above can be found in the fragment shader's main() function:
import QtQuick 2.2
Image {
source: 'range.png' // The provided image
ShaderEffect {
anchors.fill: parent
blending: false
property real dx: 1.0/parent.width
property real dy: 1.0/parent.height
property variant src: parent
vertexShader: "
uniform highp mat4 qt_Matrix;
attribute highp vec4 qt_Vertex;
attribute highp vec2 qt_MultiTexCoord0;
varying highp vec2 coord;
void main() {
coord=qt_MultiTexCoord0;
gl_Position=qt_Matrix*qt_Vertex;
}"
fragmentShader: "
uniform highp float dx;
uniform highp float dy;
varying highp vec2 coord;
uniform sampler2D src;
void main() {
highp float dzdx=( texture2D(src,coord+vec2(dx,0.0)).x - texture2D(src,coord+vec2(-dx,0.0)).x )/(2.0*dx);
highp float dzdy=( texture2D(src,coord+vec2(0.0,dy)).x - texture2D(src,coord+vec2(0.0,-dy)).x )/(2.0*dy);
highp vec3 d=vec3(-dzdx,-dzdy,1.0);
highp vec3 n=normalize(d);
highp vec3 lightDirection=vec3(1.0,-2.0,3.0);
highp float shading=0.5+0.5*dot(n,normalize(lightDirection));
gl_FragColor=vec4(shading,shading,shading,1.0);
}"
}
}
The code (matrix calculation) I think is right:
def normalization(data):
mo_chang =np.sqrt(np.multiply(data[:,:,0],data[:,:,0])+np.multiply(data[:,:,1],data[:,:,1])+np.multiply(data[:,:,2],data[:,:,2]))
mo_chang = np.dstack((mo_chang,mo_chang,mo_chang))
return data/mo_chang
x,y=np.meshgrid(np.arange(0,width),np.arange(0,height))
x=x.reshape([-1])
y=y.reshape([-1])
xyz=np.vstack((x,y,np.ones_like(x)))
pts_3d=np.dot(np.linalg.inv(K),xyz*img1_depth.reshape([-1]))
pts_3d_world=pts_3d.reshape((3,height,width))
f= pts_3d_world[:,1:height-1,2:width]-pts_3d_world[:,1:height-1,1:width-1]
t= pts_3d_world[:,2:height,1:width-1]-pts_3d_world[:,1:height-1,1:width-1]
normal_map=np.cross(f,l,axisa=0,axisb=0)
normal_map=normalization(normal_map)
normal_map=normal_map*0.5+0.5
alpha = np.full((height-2,width-2,1), (1.), dtype="float32")
normal_map=np.concatenate((normal_map,alpha),axis=2)
We should use the camera intrinsics named 'K'. I think the value f and t is based on 3D points in camera coordinate.
For the normal vector, the (-1,-1,100) and (255,255,100) are the same color in 8 bites images but they are totally different normal. So we should map the normal values to (0,1) by normal_map=normal_map*0.5+0.5.
Welcome to communication.

Schlick geometric attenuation function in shader producing incorrect results

I have been searching online for a while now on why my geometric attenuation term for my physically based shader (Which I posted a question about not too long ago) and I cannot seem to come up with a result. The function I'm trying to implement can be found here: http://blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_notes_v2.pdf
This is my current iteration of the function.
vec3 Gsub(vec3 v) // Sub Function of G
{
float k = ((roughness + 1) * (roughness + 1)) / 8;
float fdotv = dot(fNormal, v);
return vec3((fdotv) / ((fdotv) * (1.0 - k) + k));
}
vec3 G(vec3 l, vec3 v, vec3 h) // Geometric Attenuation Term - Schlick Modified (k = a/2)
{
return Gsub(l) * Gsub(v);
}
This is the current result of the above in my application:
You can clearly see the strange artifacts on the left side, which should not be present.
One of the things I thought was an issue was my normals. I believe this is the issue, because whenever I put the same function into the Disney BRDF editor (http://www.disneyanimation.com/technology/brdf.html) I get correct results. I believe it is the normals because whenever I view the normals in Disney's application, I get this.
These normals differ from my normals, which -should- be correct:
I use the same model in both applications, and the normals are stored inside the model file. Can anyone give any insight into this?
Additionally I'd like to mention that these are the operations done on my normals:
Vertex Shader
mat3 normalMatrix = mat3(transpose(inverse(ModelView)));
inputNormal = normalize(normalMatrix * vNormal);
Fragment Shader
fNormal = normalize(inputNormal);
P.S. Please excuse my rushy-code, I've been trying to get this to work for a while.

GLSL Channel Selection

I have a GLSL shader that reads from one of the channels (e.g. R) of an input texture and then writes to the same channel in an output texture. This channel has to be selected by the user.
What I can think of right now is to just use an int uniform and tons of if-statements:
uniform sampler2D uTexture;
uniform int uChannelId;
varying vec2 vUv;
void main() {
//read in data from texture
vec4 t = texture2D(uTexture, vUv);
float data;
if (uChannelId == 0) {
data = t.r;
} else if (uChannelId == 1) {
data = t.g;
} else if (uChannelId == 2) {
data = t.b;
} else {
data = t.a;
}
//process the data...
float result = data * 2; //for example
//write out
if (uChannelId == 0) {
gl_FragColor = vec4(result, t.g, t.b, t.a);
} else if (uChannelId == 1) {
gl_FragColor = vec4(t.r, result, t.b, t.a);
} else if (uChannelId == 2) {
gl_FragColor = vec4(t.r, t.g, result, t.a);
} else {
gl_FragColor = vec4(t.r, t.g, t.b, result);
}
}
Is there any way of doing something like a dictionary access such as t[uChannelId]?
Or perhaps I should have 4 different versions of the same shader, each of which processes a different channel, so that I can avoid all the if-statements?
What is the best way to do this?
EDIT: To be more specific, I am using WebGL (Three.js)
There is such a way, and it is as simple as you actually wrote it in the question. Just use t[channelId]. To quote the GLSL Spec (This is from Version 3.30, Section 5.5, but applies to other versions as well):
Array subscripting syntax can also be applied to vectors to provide numeric indexing. So in
vec4 pos;
pos[2] refers to the third element of pos and is equivalent to pos.z. This allows variable indexing into a
vector, as well as a generic way of accessing components. Any integer expression can be used as the
subscript. The first component is at index zero. Reading from or writing to a vector using a constant
integral expression with a value that is negative or greater than or equal to the size of the vector is illegal.
When indexing with non-constant expressions, behavior is undefined if the index is negative, or greater
than or equal to the size of the vector.
Note that for the first part of your code, you use this to access a specific channel of a texture. You could also use the ARB_texture_swizzle functionality. In that case, you would just use a fxied channel, say r, for access in the shader and what swizzle the actual texture channels so that wahtever channel you want to access becomes r.
Update: as the target platform turned out to be webgl, these suggestions are not available. However, a simple solution would be to use a vec4 uniform in place of uChannelID which is 1.0 for the selected component and 0.0 for all others. Say this variable is called uChannelSel. You could use data=dot(t, uChannelSel) in the first part and gl_FragColor=(vec4(1.0)-uChannelSel) * t + uChannelSel*result for the second part.
as i'm sure you know, branching can be expensive in shaders. however, it sounds like it'll always be the same channel in a pass (yes?), so you might maintain enough cohesion to see good performance.
it's been a good while since i've used GLSL, but if you're using a newer version, maybe you could do some bitwise shifting (<< or >>) magic? you would read the texture into int instead of vec4, then shift it a number of bits depending on which channel you want to read.