GLSL artefacts when ray marching - glsl

In the following shadertoy I illustrate an artefact that occurs when raymarching
https://www.shadertoy.com/view/stdGDl
This is my "scene" (see code fragment below). It renders a primitive "tunnel_fragment" which is an SDF (Signed Distance Function), and uses modulo on the coordinates to calculate "infinite" repetitions of these fragments. It then also calculates which disk we are in (odd/even) to displace them.
I really don't understand why these artefacts occur when the disks (or rings -> see tunnel_fragment, if you remove a comment they become rings instead of disks) present these artefacts when the alternate movement in x direction becomes large.
These artefacts don't appear when the disk structure moves to the right on its whole, it only appears when the disks alternate and the entire structure becomes more complex.
What am I doing wrong? It's really boggling me.
vec2 scene(in vec3 p)
{
float thick = 0.1;
vec3 cp = p;
// Use modulo to simulate inf disks
vec3 c = vec3(0,0,6.0*thick);
vec3 q = mod(cp+0.5*c,c)-0.5*c;
// Find index of the disk
vec3 disk = (cp+0.5*c) / (c);
float idx = floor(disk.z);
// Do something simple with odd/even disks
// Note: changing this shows the artefacts are always there
if(mod(idx,2.0) == 0.0) {
q.x += sin(disk.z*t)*t*t;
} else {
q.x -= sin(disk.z*t)*t*t;
}
float d = tunnel_fragment(q, vec3(0.0), vec3(0.0, 0.0, 1.0), 2.0, thick, 0.2);
return vec2(d, idx);
}

The problem is illustrated with this diagram:
When the current disk (based on modulo) is offset by more than the spacing between the disks, then the distance that you calculate is larger than the distance to the next disk. Consequently you risk in over-stepping the next disk.
To solve this you need to either limit the offset (as said -- no more than the spacing between the disks), or sample odd/even disks separately and min() between them.

Related

Stuck trying to optimize complex GLSL fragment shader

So first off, let me say that while the code works perfectly well from a visual point of view, it runs into very steep performance issues that get progressively worse as you add more lights. In its current form it's good as a proof of concept, or a tech demo, but is otherwise unusable.
Long story short, I'm writing a RimWorld-style game with real-time top-down 2D lighting. The way I implemented rendering is with a 3 layered technique as follows:
First I render occlusions to a single-channel R8 occlusion texture mapped to a framebuffer. This part is lightning fast and doesn't slow down with more lights, so it's not part of the problem:
Then I invoke my lighting shader by drawing a huge rectangle over my lightmap texture mapped to another framebuffer. The light data is stored in an array in an UBO and it uses the occlusion mapping in its calculations. This is where the slowdown happens:
And lastly, the lightmap texture is multiplied and added to the regular world renderer, this also isn't affected by the number of lights, so it's not part of the problem:
The problem is thus in the lightmap shader. The first iteration had many branches which froze my graphics driver right away when I first tried it, but after removing most of them I get a solid 144 fps at 1440p with 3 lights, and ~58 fps at 1440p with 20 lights. An improvement, but it scales very poorly. The shader code is as follows, with additional annotations:
#version 460 core
// per-light data
struct Light
{
vec4 location;
vec4 rangeAndstartColor;
};
const int MaxLightsCount = 16; // I've also tried 8 and 32, there was no real difference
layout(std140) uniform ubo_lights
{
Light lights[MaxLightsCount];
};
uniform sampler2D occlusionSampler; // the occlusion texture sampler
in vec2 fs_tex0; // the uv position in the large rectangle
in vec2 fs_window_size; // the window size to transform world coords to view coords and back
out vec4 color;
void main()
{
vec3 resultColor = vec3(0.0);
const vec2 size = fs_window_size;
const vec2 pos = (size - vec2(1.0)) * fs_tex0;
// process every light individually and add the resulting colors together
// this should be branchless, is there any way to check?
for(int idx = 0; idx < MaxLightsCount; ++idx)
{
const float range = lights[idx].rangeAndstartColor.x;
const vec2 lightPosition = lights[idx].location.xy;
const float dist = length(lightPosition - pos); // distance from current fragment to current light
// early abort, the next part is expensive
// this branch HAS to be important, right? otherwise it will check crazy long lines against occlusions
if(dist > range)
continue;
const vec3 startColor = lights[idx].rangeAndstartColor.yzw;
// walk between pos and lightPosition to find occlusions
// standard line DDA algorithm
vec2 tempPos = pos;
int lineSteps = int(ceil(abs(lightPosition.x - pos.x) > abs(lightPosition.y - pos.y) ? abs(lightPosition.x - pos.x) : abs(lightPosition.y - pos.y)));
const vec2 lineInc = (lightPosition - pos) / lineSteps;
// can I get rid of this loop somehow? I need to check each position between
// my fragment and the light position for occlusions, and this is the best I
// came up with
float lightStrength = 1.0;
while(lineSteps --> 0)
{
const vec2 nextPos = tempPos + lineInc;
const vec2 occlusionSamplerUV = tempPos / size;
lightStrength *= 1.0 - texture(occlusionSampler, vec2(occlusionSamplerUV.x, 1 - occlusionSamplerUV.y)).x;
tempPos = nextPos;
}
// the contribution of this light to the fragment color is based on
// its square distance from the light, and the occlusions between them
// implemented as multiplications
const float strength = max(0, range - dist) / range * lightStrength;
resultColor += startColor * strength * strength;
}
color = vec4(resultColor, 1.0);
}
I call this shader as many times as I need, since the results are additive. It works with large batches of lights or one by one. Performance-wise, I didn't notice any real change trying different batch numbers, which is perhaps a bit odd.
So my question is, is there a better way to look up for any (boolean) occlusions between my fragment position and light position in the occlusion texture, without iterating through every pixel by hand? Could render buffers perhaps help here (from what I've read they're for reading data back to system memory, I need it in another shader though)?
And perhaps, is there a better algorithm for what I'm doing here?
I can think of a couple routes for optimization:
Exact: apply a distance transform on the occlusion map: this will give you the distance to the nearest occluder at each pixel. After that you can safely step by that distance within the loop, instead of doing baby steps. This will drastically reduce the number of steps in open regions.
There is a very simple CPU-side algorithm to compute a DT, and it may suit you if your occluders are static. If your scene changes every frame, however, you'll need to search the literature for GPU side algorithms, which seem to be more complicated.
Inexact: resort to soft shadows -- it might be a compromise you are willing to make, and even seen as an artistic choice. If you are OK with that, you can create a mipmap from your occlusion map, and then progressively increase the step and sample lower levels as you go farther from the point you are shading.
You can go further and build an emitters map (into the same 4-channel map as the occlusion). Then your entire shading pass will be independent of the number of lights. This is an equivalent of voxel cone tracing GI applied to 2D.

OpenGL Terrain System, small height difference between GPU and CPU

A quick summary:
I've a simple Quad tree based terrain rendering system that builds terrain patches which then sample a heightmap in the vertex shader to determine the height of each vertex.
The exact same calculation is done on the CPU for object placement and co.
Super straightforward, but now after adding some systems to procedurally place objects I've discovered that they seem to be misplaced by just a small amount. To debug this I render a few crosses as single models over the terrain. The crosses (red, green, blue lines) represent the height read from the CPU. While the terrain mesh uses a shader to translate the vertices.
(I've also added a simple odd/even gap over each height value to rule out a simple offset issue. So those ugly cliffs are expected, the submerged crosses are the issue)
I'm explicitly using GL_NEAREST to be able to display the "raw" height value:
As you can see the crosses are sometimes submerged under the terrain instead of representing its exact height.
The heightmap is just a simple array of floats on the CPU and on the GPU.
How the data is stored
A simple vector<float> which is uploaded into a GL_RGB32F GL_FLOAT buffer. The floats are not normalized and my terrain usually contains values between -100 and 500.
How is the data accessed in the shader
I've tried a few things to rule out errors, the inital:
vec2 terrain_heightmap_uv(vec2 position, Heightmap heightmap)
{
return (position + heightmap.world_offset) / heightmap.size;
}
float terrain_read_height(vec2 position, Heightmap heightmap)
{
return textureLod(heightmap.heightmap, terrain_heightmap_uv(position, heightmap), 0).r;
}
Basics of the vertex shader (the full shader code is very long, so I've extracted the part that actually reads the height):
void main()
{
vec4 world_position = a_model * vec4(a_position, 1.0);
vec4 final_position = world_position;
// snap vertex to grid
final_position.x = floor(world_position.x / a_quad_grid) * a_quad_grid;
final_position.z = floor(world_position.z / a_quad_grid) * a_quad_grid;
final_position.y = terrain_read_height(final_position.xz, heightmap);
gl_Position = projection * view * final_position;
}
To ensure the slightly different way the position is determined I tested it using hardcoded values that are identical to how C++ reads the height:
return texelFetch(heightmap.heightmap, ivec2((position / 8) + vec2(1024, 1024)), 0).r;
Which gives the exact same result...
How is the data accessed in the application
In C++ the height is read like this:
inline float get_local_height_safe(uint32_t x, uint32_t y)
{
// this macro simply clips x and y to the heightmap bounds
// it does not interfer with the result
BB_TERRAIN_HEIGHTMAP_BOUND_XY_TO_SAFE;
uint32_t i = (y * _size1d) + x;
return buffer->data[i];
}
inline float get_height_raw(glm::vec2 position)
{
position = position + world_offset;
uint32_t x = static_cast<int>(position.x);
uint32_t y = static_cast<int>(position.y);
return get_local_height_safe(x, y);
}
float BB::Terrain::get_height(const glm::vec3 position)
{
return heightmap->get_height_raw({position.x / heightmap_unit_scale, position.z / heightmap_unit_scale});
}
What have I tried:
Comparing the Buffers
I've dumped the first few hundred values from the vector. And compared it with the floating point buffer uploaded to the GPU using Nvidia Nsight, they are equal, rounding/precision errors there.
Sampling method
I've tried texture, textureLod and texelFetch to rule out some issue there, they all give me the same result.
Rounding
The super strange thing, when I round all the height values. They are perfectly aligned which just screams floating point precision issues.
Position snapping
I've tried rounding, flooring and ceiling the position, to ensure the position always maps to the same texel. I also tried adding an epsilon offset to rule out a positional precision error (probably stupid because the terrain is stable...)
Heightmap sizes
I've tried various heightmaps, also of different sizes.
Heightmap patterns
I've created a heightmap containing a pattern to ensure the position is not just offsetet.

How to prevent excessive SSAO at a distance

I am using SSAO very nearly as per John Chapman's tutorial here, in fact, using Sascha Willems Vulkan example.
One difference is the fragment position is saved directly to a G-Buffer along with linear depth (so there are x, y, z, and w coordinates, w being the linear depth, calculated in the G-Buffer shader. Depth is calculated like this:
float linearDepth(float depth)
{
return (2.0f * ubo.nearPlane * ubo.farPlane) / (ubo.farPlane + ubo.nearPlane - depth * (ubo.farPlane - ubo.nearPlane));
}
My scene typically consists of a large, flat floor with a model in the centre. By large I mean a lot bigger than the far clip distance.
At high depth values (i.e. at the horizon in my example), the SSAO is generating occlusion where there should really be none - there's nothing out there except a completely flat surface.
Along with that occlusion, there comes some banding as well.
Any ideas for how to prevent these occlusions occurring?
I found this solution while I was writing the question, which works only because I have a flat floor.
I look up the normal value at each kernel sample position, and compare to the current normal, discarding any with a dot product that is close to 1. This means flat planes can't self-occlude.
Any comments on why I shouldn't do this, or better alternatives, would be very welcome!
It works for my current situation but if I happened to have non-flat geometry on the floor I'd be looking for a different solution.
vec3 normal = normalize(texture(samplerNormal, newUV).rgb * 2.0 - 1.0);
<snip>
for(int i = 0; i < SSAO_KERNEL_SIZE; i++)
{
<snip>
float sampleDepth = -texture(samplerPositionDepth, offset.xy).w;
vec3 sampleNormal = normalize(texture(samplerNormal, offset.xy).rgb * 2.0 - 1.0);
if(dot(sampleNormal, normal) > 0.99)
continue;

casting gl_VertexID from int to float very slow

I am rendering an octree that contains points to a FBO.
I want a way to identify the points I am rendering.
To do so, I set an ID to each of the octree nodes (16bit integer). And I use gl_VertexID to identify a point in a node (no more than 65k points per nodes)
I output this to a RGBA texture with the octree node identifier written to the rg color components and the vertex ID writtent to the ba color components.
vec4 getIdColor() {
float r = mod(nodeID, 256.0) / 255.0;
float g = (nodeID / 256.0) / 255.0;
float b = mod(gl_VertexID, 256.0) / 255.0;
float a = (gl_VertexID/ 256.0) / 255.0;
return vec4(r, g, b, a);
}
The problem is that the gl_VertexID cast from int to float is really slow (I go from 60FPS to 2-3 FPS when rendering 2 million points).
EDIT : I also have the exact same problem when just using gl_VertexID. If I remove the mods and juste write
return vec4(gl_VertexID);
I have the same hit on the framerate. So the problems comes from gl_VertexID, not the mod
Is there a workaround (also, what causes this ?)
I found the problem. In the shader, if was using a if/else cascade (I know it's not good practice but it was a test shader).
Seems that I went over some cache size. Generating the shaders code on the fly with only the sections which conditions evaluates to true fixed the issue. It was both the number of conditions & the access to gl_VertexID that slowed the rendering down.

How to handle incorrect index calculation for discretized ray tracing?

The situation si as follows. I am trying to implement a linear voxel search in a glsl shader for efficient voxel ray tracing. In toehr words, I have a 3D texture and I am ray tracing on it but I am trying to ray trace such that I only ever check voxels intersected by the ray once.
To this effect I have written a program with the following results:
Not efficient but correct:
The above image was obtained by adding a small epsilon ray multiple times and sampling from the texture on each iteration. Which produces the correct results but it's very inefficient.
That would look like:
loop{
start += direction*0.01;
sample(start);
}
To make it efficient I decided to instead implement the following lookup function:
float bound(float val)
{
if(val >= 0)
return voxel_size;
return 0;
}
float planeIntersection(vec3 ray, vec3 origin, vec3 n, vec3 q)
{
n = normalize(n);
if(dot(ray,n)!=0)
return (dot(q,n)-dot(n,origin))/dot(ray,n);
return -1;
}
vec3 get_voxel(vec3 start, vec3 direction)
{
direction = normalize(direction);
vec3 discretized_pos = ivec3((start*1.f/(voxel_size))) * voxel_size;
vec3 n_x = vec3(sign(direction.x), 0,0);
vec3 n_y = vec3(0, sign(direction.y),0);
vec3 n_z = vec3(0, 0,sign(direction.z));
float bound_x, bound_y, bound_z;
bound_x = bound(direction.x);
bound_y = bound(direction.y);
bound_z = bound(direction.z);
float t_x, t_y, t_z;
t_x = planeIntersection(direction, start, n_x,
discretized_pos+vec3(bound_x,0,0));
t_y = planeIntersection(direction, start, n_y,
discretized_pos+vec3(0,bound_y,0));
t_z = planeIntersection(direction, start, n_z,
discretized_pos+vec3(0,0,bound_z));
if(t_x < 0)
t_x = 1.f/0.f;
if(t_y < 0)
t_y = 1.f/0.f;
if(t_z < 0)
t_z = 1.f/0.f;
float t = min(t_x, t_y);
t = min(t, t_z);
return start + direction*t;
}
Which produces the following result:
Notice the triangle aliasing on the left side of some surfaces.
It seems this aliasing occurs because some coordinates are not being set to their correct voxel.
For example modifying the truncation part as follows:
vec3 discretized_pos = ivec3((start*1.f/(voxel_size)) - vec3(0.1)) * voxel_size;
Creates:
So it has fixed the issue for some surfaces and caused it for others.
I wanted to know if there is a way in which I can correct this truncation so that this error does not happen.
Update:
I have narrowed down the issue a bit. Observe the following image:
The numbers represent the order in which I expect the boxes to be visited.
As you can see for some of the points the sampling of the fifth box seems to be ommitted.
The following is the sampling code:
vec4 grabVoxel(vec3 pos)
{
pos *= 1.f/base_voxel_size;
pos.x /= (width-1);
pos.y /= (depth-1);
pos.z /= (height-1);
vec4 voxelVal = texture(voxel_map, pos);
return voxelVal;
}
yep that was the +/- rounding I was talking about in my comments somewhere in your previous questions related to this. What you need to do is having step equal to grid size in one of the axises (and test 3 times once for |dx|=1 then for |dy|=1 and lastly |dz|=1).
Also you should create a debug draw 2D slice through your map to actually see where the hits for a single specific test ray occurred. Now based on direction of ray in each axis you set the rounding rules separately. Without this you are just blindly patching one case and corrupting other two ...
Now actually Look at this (I linked it to your before but you clearly did not):
Wolf and Doom ray casting techniques
especially pay attention to:
On the right It shows you how to compute the ray step (your epsilon). You simply scale the ray direction so one of the coordinate is +/-1. For simplicity start with 2D slice through your map. The red dot is ray start position. Green is ray step vector for vertical grid lines hits and red is for horizontal grid lines hits (z will be analogically the same).
Now you should add the 2D overview of your map through some height slice that is visible (like on the image on the left) add a dot or marker to each intersection detected but distinguish between x,y and z hits by color. Do this for single ray only (I use the center of view ray). Fist handle view when you look at X+ directions than X- and when done move to Y,Z ...
In my GLSL volumetric 3D back raytracer I also linked you before look at these lines:
if (dir.x<0.0) { p+=dir*(((floor(p.x*n)-_zero)*_n)-ray_pos.x)/dir.x; nnor=vec3(+1.0,0.0,0.0); }
if (dir.x>0.0) { p+=dir*((( ceil(p.x*n)+_zero)*_n)-ray_pos.x)/dir.x; nnor=vec3(-1.0,0.0,0.0); }
if (dir.y<0.0) { p+=dir*(((floor(p.y*n)-_zero)*_n)-ray_pos.y)/dir.y; nnor=vec3(0.0,+1.0,0.0); }
if (dir.y>0.0) { p+=dir*((( ceil(p.y*n)+_zero)*_n)-ray_pos.y)/dir.y; nnor=vec3(0.0,-1.0,0.0); }
if (dir.z<0.0) { p+=dir*(((floor(p.z*n)-_zero)*_n)-ray_pos.z)/dir.z; nnor=vec3(0.0,0.0,+1.0); }
if (dir.z>0.0) { p+=dir*((( ceil(p.z*n)+_zero)*_n)-ray_pos.z)/dir.z; nnor=vec3(0.0,0.0,-1.0); }
they are how I did this. As you can see I use different rounding/flooring rule for each of the 6 cases. This way you handle case without corrupting the other. The rounding rule depends on a lot of stuff like how is your coordinate system offseted to (0,0,0) and more so it might be different in your code but the if conditions should be the same. Also as you can see I am handling this by offsetting the ray start position a bit instead of having these conditions inside the ray traversal loop castray.
That macro cast ray and look for intersections with grid and on top of that actually zsorts the intersections and use the first valid one (that is what l,ll are for and no other conditions or combination of ray results are needed). So my way of deal with this is cast ray for each type of intersection (x,y,z) starting on the first intersection with the grid for the same axis. You need to take into account the starting offset so the l,ll resembles the intersection distance to real start of ray not to offseted one ...
Also a good idea is to do this on CPU side first and when 100% working port to GLSL as in GLSL is very hard to debug things like this.