GLSL NVidia square artifacts - c++

I have encountered a problem when GLSL shader generates incorrect image on following GPU's:
GT 430
GT 770
GTX 570
GTX 760
But works normally on these:
Intel HD Graphics 2500
Intel HD 4000
Intel 4400
GTX 740M
Radeon HD 6310M
Radeon HD 8850
Shader code is as follows:
bool PointProjectionInsideTriangle(vec3 p1, vec3 p2, vec3 p3, vec3 point)
{
vec3 n = cross((p2 - p1), (p3 - p1));
vec3 n1 = cross((p2 - p1), n);
vec3 n2 = cross((p3 - p2), n);
vec3 n3 = cross((p1 - p3), n);
float proj1 = dot((point - p2), n1);
float proj2 = dot((point - p3), n2);
float proj3 = dot((point - p1), n3);
if(proj1 > 0.0)
return false;
if(proj2 > 0.0)
return false;
if(proj3 > 0.0)
return false;
return true;
}
struct Intersection
{
vec3 point;
vec3 norm;
bool valid;
};
Intersection GetRayTriangleIntersection(vec3 rayPoint, vec3 rayDir, vec3 p1, vec3 p2, vec3 p3)
{
vec3 norm = normalize(cross(p1 - p2, p1 - p3));
Intersection res;
res.norm = norm;
res.point = vec3(rayPoint.xy, 0.0);
res.valid = PointProjectionInsideTriangle(p1, p2, p3, res.point);
return res;
}
struct ColoredIntersection
{
Intersection geomInt;
vec4 color;
};
#define raysCount 15
void main(void)
{
vec2 radius = (gl_FragCoord.xy / vec2(800.0, 600.0)) - vec2(0.5, 0.5);
ColoredIntersection ints[raysCount];
vec3 randomPoints[raysCount];
int i, j;
for(int i = 0; i < raysCount; i++)
{
float theta = 0.5 * float(i);
float phi = 3.1415 / 2.0;
float r = 1.0;
randomPoints[i] = vec3(r * sin(phi) * cos(theta), r * sin(phi)*sin(theta), r * cos(phi));
vec3 tangent = normalize(cross(vec3(0.0, 0.0, 1.0), randomPoints[i]));
vec3 trianglePoint1 = randomPoints[i] * 2.0 + tangent * 0.2;
vec3 trianglePoint2 = randomPoints[i] * 2.0 - tangent * 0.2;
ints[i].geomInt = GetRayTriangleIntersection(vec3(radius, -10.0), vec3(0.0, 0.0, 1.0), vec3(0.0, 0.0, 0.0), trianglePoint1, trianglePoint2);
if(ints[i].geomInt.valid)
{
float c = length(ints[i].geomInt.point);
ints[i].color = vec4(c, c, c, 1.0);
}
}
for(i = 0; i < raysCount; i++)
{
for(j = i + 1; j < raysCount; j++)
{
if(ints[i].geomInt.point.z < ints[i].geomInt.point.z - 10.0)
{
gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0);
ColoredIntersection tmp = ints[j];
ints[j] = ints[i];
ints[i] = tmp;
}
}
}
vec4 resultColor = vec4(0.0, 0.0, 0.0, 0.0);
for(i = 0; i < raysCount + 0; i++)
{
if(ints[i].geomInt.valid)
resultColor += ints[i].color;
}
gl_FragColor = clamp(resultColor, 0.0, 1.0);
}
Upd: I have replaced vector normalizations with builtin functions and added gl_FragColor claming just in case.
The code is a simplified version of an actual shader, expected image is:
But what I get is:
Random rotations of the code remove artifacts completely. For example if I change the line
if(ints[i].geomInt.valid) //1
to
if(ints[i].geomInt.valid == true) //1
which apparently should not affect logic in any way or completely remove double cycle that does nothing (marked as 2) artifacts vanish. Please note that the double cycle does nothing at all since condition
if(ints[i].geomInt.point.z < ints[i].geomInt.point.z - 10.0)
{
gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0);
return;
ColoredIntersection tmp = ints[j];
ints[j] = ints[i];
ints[i] = tmp;
}
Can never be satisfied(left and right sides have index i, not i, j) and there's no NaN's. This code does absolutely nothing yet somehow produces artifacts.
You can test the shader and demo on your own using this project(full MSVS 2010 project + sources + compiled binary and a shader, uses included SFML): https://dl.dropboxusercontent.com/u/25635148/ShaderTest.zip
I use sfml in this test project, but that's 100% irrelevant because the actual project I've enountered this problem does not use this lib.
What I want to know is why these artifacts appear and how to reliably avoid them.

I don't think anything is wrong with your shader. The openGL pipeline renders to a framebuffer. If you make use of that framebuffer before the rendering has completed, you will often get what you have seen. Please bear in mind that glDrawArrays and similar are asynchonous (the function returns before the GPU has finished drawing the vertices.)
The most common use for those square artefacts are when you use the resultant framebuffer as texture which is then use for further rendering.
The OpenGL driver is supposed to keep track of dependencies and should know how to wait for dependencies to be fulfilled.
If you are sharing a framebuffer across threads, however, all bets are off, you then might need to make use of things like a fence sync (glFenceSync) to ensure that that one thread waits for rendering that is taking place on another thread.
As a workaround, you might find calling glFinish or even glReadPixels (with one pixel) sorts the issue out.
Please also bear in mind that this problem is timing related and simplifying a shader might very well make the issue go away.

If anyone's still interested I asked this question on numerous specialized sites including opengl.org and devtalk.nvidia.com. I did not receive any concrete answer on what's wrong with my shader, just some suggestions how to work around my problem. Like use if(condition == true) instead of if(condition), use as simple algorithms as possible and such. In the end I've chosen one of the easiest rotations of my code that gets rid of the problem: I just replaced
struct Intersection
{
vec3 point;
vec3 norm;
bool valid;
};
with
struct Intersection
{
bool valid;
vec3 point;
vec3 norm;
};
There were numerous other code rotations that made the artifacts disappear, but I've chosen this one because I was able to test in on most other systems I had trouble with before.

I've seen this exact thing happen in GLSL when variables aren't initialized. For example, a vec3 will be (0,0,0) by default on some graphics cards, but on other graphics cards it will be a different value. Are you sure you're not using a variable without first assigning it a value? Specifically you aren't initializing ColoredIntersection.color if Insersection.valid is false, but I think you are using it later.

Related

Simulating diffusion equation in GLSL

I'm trying to simulate diffusion in glsl (not the Gray Scott reaction-diffusion equation), and I seem to be having trouble getting it to work quite right. In all of my tests so far, diffusion stops at a certain point and reaches an equilibrium long before I expect it to.
My glsl code:
#version 460
#define KERNEL_SIZE 9
float kernel[KERNEL_SIZE];
vec2 offset[KERNEL_SIZE];
uniform float width;
uniform float height;
uniform sampler2D current_concentration_tex;
uniform float diffusion_constant; // rate of diffusion of U
in vec2 vTexCoord0;
out vec4 fragColor;
void main(void)
{
float w = 1.0/width;
float h = 1.0/height;
kernel[0] = 0.707106781;
kernel[1] = 1.0;
kernel[2] = 0.707106781;
kernel[3] = 1.0;
kernel[4] = -6.82842712;
kernel[5] = 1.0;
kernel[6] = 0.707106781;
kernel[7] = 1.0;
kernel[8] = 0.707106781;
offset[0] = vec2( -w, -h);
offset[1] = vec2(0.0, -h);
offset[2] = vec2( w, -h);
offset[3] = vec2( -w, 0.0);
offset[4] = vec2(0.0, 0.0);
offset[5] = vec2( w, 0.0);
offset[6] = vec2( -w, h);
offset[7] = vec2(0.0, h);
offset[8] = vec2( w, h);
float chemical_density = texture( current_concentration_tex, vTexCoord0 ).r; // reference texture from last frame
float laplacian;
for( int i=0; i<KERNEL_SIZE; i++ ){
float tmp = texture( current_concentration_tex, vTexCoord0 + offset[i] ).r;
laplacian += tmp * kernel[i];
}
float du = diffusion_constant * laplacian; // diffusion equation
chemical_density += du; // diffuse; dt * du
//chemical_density *= 0.9999; // decay
fragColor = vec4( clamp(chemical_density, 0.0, 1.0 ), 0.0, 0.0, 1.0 );
}
If a diffusion simulation was working properly, I would expect to draw a "source" each frame, and have the chemical diffusion away the source slowly into the rest of the frame. However, diffusion seems to "stop" early and doesn't fade any further. For example, when I draw a constant ring in the center as my source:
I also tried it with a one-time initial condition of chemicals in the center, with no additional chemicals added -- again, I would expect it to fade to zero (evenly distributed across the entire frame), but instead it stops much earlier than I would expect:
Is there something wrong with my simulation code in glsl? Or is this more of a numerical methods kind of issue? Would expanding from a 3x3 kernel to a larger one improve the situation?
(Moved from comment)
This is most probably a precision issue -- are you rendering it into an 8-bit target? You probably want, at least, a 32-bit float.
Since you use modern OpenGL, it is best to create your textures with glTextureStorage2D and specify GL_RGBA32F for the internalformat. There's no such thing as "default format for my GL texture was 8-bit RGBA", unless you use a legacy API, a non-standard extension, or rendering directly on the screen.

GLSL compute shader flickering blocks/squares artifact

I'm trying to write a bare minimum GPU raycaster using compute shaders in OpenGL. I'm confident the raycasting itself is functional, as I've gotten clean outlines of bounding boxes via a ray-box intersection algorithm.
However, when attempting ray-triangle intersection, I get strange artifacts. My shader is programmed to simply test for a ray-triangle intersection, and color the pixel white if an intersection was found and black otherwise. Instead of the expected behavior, when the triangle should be visible onscreen, the screen is instead filled with black and white squares/blocks/tiles which flicker randomly like TV static. The squares are at most 8x8 pixels (the size of my compute shader blocks), although there are dots as small as single pixels as well. The white blocks generally lie in the expected area of my triangle, although sometimes they are spread out across the bottom of the screen as well.
Here is a video of the artifact. In my full shader the camera can be rotated around and the shape appears more triangle-like, but the flickering artifact is the key issue and still appears in this video which I generated from the following minimal version of my shader code:
layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;
uvec2 DIMS = gl_NumWorkGroups.xy*gl_WorkGroupSize.xy;
uvec2 UV = gl_GlobalInvocationID.xy;
vec2 uvf = vec2(UV) / vec2(DIMS);
layout(location = 1, rgba8) uniform writeonly image2D brightnessOut;
struct Triangle
{
vec3 v0;
vec3 v1;
vec3 v2;
};
struct Ray
{
vec3 origin;
vec3 direction;
vec3 inv;
};
// Wikipedia Moller-Trumbore algorithm, GLSL-ified
bool ray_triangle_intersection(vec3 rayOrigin, vec3 rayVector,
in Triangle inTriangle, out vec3 outIntersectionPoint)
{
const float EPSILON = 0.0000001;
vec3 vertex0 = inTriangle.v0;
vec3 vertex1 = inTriangle.v1;
vec3 vertex2 = inTriangle.v2;
vec3 edge1 = vec3(0.0);
vec3 edge2 = vec3(0.0);
vec3 h = vec3(0.0);
vec3 s = vec3(0.0);
vec3 q = vec3(0.0);
float a = 0.0, f = 0.0, u = 0.0, v = 0.0;
edge1 = vertex1 - vertex0;
edge2 = vertex2 - vertex0;
h = cross(rayVector, edge2);
a = dot(edge1, h);
// Test if ray is parallel to this triangle.
if (a > -EPSILON && a < EPSILON)
{
return false;
}
f = 1.0/a;
s = rayOrigin - vertex0;
u = f * dot(s, h);
if (u < 0.0 || u > 1.0)
{
return false;
}
q = cross(s, edge1);
v = f * dot(rayVector, q);
if (v < 0.0 || u + v > 1.0)
{
return false;
}
// At this stage we can compute t to find out where the intersection point is on the line.
float t = f * dot(edge2, q);
if (t > EPSILON) // ray intersection
{
outIntersectionPoint = rayOrigin + rayVector * t;
return true;
}
return false;
}
void main()
{
// Generate rays by calculating the distance from the eye
// point to the screen and combining it with the pixel indices
// to produce a ray through this invocation's pixel
const float HFOV = (3.14159265359/180.0)*45.0;
const float WIDTH_PX = 1280.0;
const float HEIGHT_PX = 720.0;
float VIEW_PLANE_D = (WIDTH_PX/2.0)/tan(HFOV/2.0);
vec2 rayXY = vec2(UV) - vec2(WIDTH_PX/2.0, HEIGHT_PX/2.0);
// Rays have origin at (0, 0, 20) and generally point towards (0, 0, -1)
Ray r;
r.origin = vec3(0.0, 0.0, 20.0);
r.direction = normalize(vec3(rayXY, -VIEW_PLANE_D));
r.inv = 1.0 / r.direction;
// Triangle in XY plane at Z=0
Triangle debugTri;
debugTri.v0 = vec3(-20.0, 0.0, 0.0);
debugTri.v1 = vec3(20.0, 0.0, 0.0);
debugTri.v0 = vec3(0.0, 40.0, 0.0);
// Test triangle intersection; write 1.0 if hit, else 0.0
vec3 hitPosDebug = vec3(0.0);
bool hitDebug = ray_triangle_intersection(r.origin, r.direction, debugTri, hitPosDebug);
imageStore(brightnessOut, ivec2(UV), vec4(vec3(float(hitDebug)), 1.0));
}
I render the image to a fullscreen triangle using a normal sampler2D and rasterized triangle UVs chosen to map to screen space.
None of this code should be time dependent, and I've tried multiple ray-triangle algorithms from various sources including both branching and branch-free versions and all exhibit the same problem which leads me to suspect some sort of memory incoherency behavior I'm not familiar with, a driver issue, or a mistake I've made in configuring or dispatching my compute (I dispatch 160x90x1 of my 8x8x1 blocks to cover my 1280x720 framebuffer texture).
I've found a few similar issues like this one on SE and the general internet, but they seem to almost exclusively be caused by using uninitialized variables, which I am not doing as far as I can tell. They mention that the pattern continues to move when viewed in the NSight debugger; while RenderDoc doesn't do that, the contents of the image do vary between draw calls even after the compute shader has finished. E.g. when inspecting the image in the compute draw call there is one pattern of artifacts, but when I scrub to the subsequent draw calls which use my image as input, the pattern in the image has changed despite nothing writing to the image.
I also found this post which seems very similar, but that one also seems to be caused by an uninitialized variable, which again I've been careful to avoid. I've also not been able to alleviate the issue by tweaking the code as they have done.
This post has a similar looking artifact which was a memory model problem, but I'm not using any shared memory.
I'm running the latest NVidia drivers (461.92) on a GTX 1070. I've tried inserting glMemoryBarrier(GL_TEXTURE_FETCH_BARRIER_BIT); (as well as some of the other barrier types) after my compute shader dispatch which I believe is the correct barrier to use if using a sampler2D to draw a texture that was previously modified by an image load/store operation, but it doesn't seem to change anything.
I just tried re-running it with glMemoryBarrier(GL_ALL_BARRIER_BITS); both before and after my dispatch call, so synchronization doesn't seem to be the issue.
Odds are that the cause of the problem lies somewhere between my chair and keyboard, but this kind of problem lies outside my usual shader debugging abilities as I'm relatively new to OpenGL. Any ideas would be appreciated! Thanks.
I've fixed the issue, and it was (unsurprisingly) simply a stupid mistake on my own part.
Observe the following lines from my code snippet:
Which leaves my v2 vertex quite uninitialized.
The moral of this story is that if you have a similar issue to the one I described above, and you swear up and down that you've initialized all your variables and it must be a driver bug or someone else's fault... quadruple-check your variables, you probably forgot to initialize one.

How to get a smooth result with RSM (Reflective Shadow Mapping)?

I'm trying to implement a Reflective Shadow Mapping program with Vulkan.
The problem is that a get bad result :
As you can see the result is not smooth.
Here I am rendering in a first pass the position, normal and flux from the light position in 3 textures with a resolution of 512 * 512.
In a second pass, I compute the indirect illumination from the first pass textures according to this paper (http://www.klayge.org/material/3_12/GI/rsm.pdf) :
for(int i = 0; i < 151; i++)
{
vec4 rsmProjCoords = projCoords + vec4(rsmDiskSampling[i] * 0.09, 0.0, 0.0);
vec3 indirectLightPos = texture(rsmPosition, rsmProjCoords.xy).rgb;
vec3 indirectLightNorm = texture(rsmNormal, rsmProjCoords.xy).rgb;
vec3 indirectLightFlux = texture(rsmFlux, rsmProjCoords.xy).rgb;
vec3 r = worldPos - indirectLightPos;
float distP2 = dot( r, r );
vec3 emission = indirectLightFlux * (max(0.0, dot(indirectLightNorm, r)) * max(0.0, dot(N, -r)));
emission *= rsmDiskSampling[i].x * rsmDiskSampling[i].x / (distP2 * distP2);
indirectRSM += emission;
}
The problem is fixed.
The main problem was the sampling, I was using a linear sampling instead of a nearest sampling :
samplerInfo.magFilter = VK_FILTER_NEAREST;
samplerInfo.minFilter = VK_FILTER_NEAREST;
Other problems were the number of VPL used and the distance between them.

Getting diffuse material right form cpu code to gpu shader

I'm trying to unwarp this function from raytracing in one weekend
vec3 color(const ray& r, hitable *world)
{
hit_record rec;
if(world->hit(r,0.0, MAXFLOAT, rec)){
vec3 target = rec.p + rec.normal + random_in_unit_sphere();
return 0.5*color( ray(rec.p, target-rec.p), world);
}
else{
vec3 unit_direction = unit_vector(r.direction());
float t = 0.5*(unit_direction.y() + 1.0);
return (1.0-t)*vec3(1.0,1.0,1.0) + t*vec3(0.5,0.7,1.0);
}
}
i've understand that it will send a ray and bounce it until it does not hit anything.
So i have attempt to unwarp this recursive function in a GLSL shader.
vec3 color(ray r, hitableList list)
{
hitRecord rec;
vec3 unitDirection;
float t;
while(hit(r, 0.0, FLT_MAX, rec, list))
{
vec3 target = rec.p + rec.normal;
r = ray(rec.p, target-rec.p);
}
unitDirection = normalize(direction(r));
t = 0.5* (unitDirection.y + 1.);
return (1.-t)*vec3(1.)+t*vec3(0.5,0.7,1.);
}
normally it should output a diffuse like this :
but i only get a reflective material like this :
note, the material is HIGHLY reflective and can reflect other sphere in the scene.
i have looked around the code and something tell me it's my wrong approche of this tail recursive fonction. Also I don't return the 0.5 * the return 0.5 * color(...) I have no idea how to do it.
UPDATE
Thanks to the awnser of Jarod42 there is now the 0.5 * factor implemented, this solve the issue of the material not being "properly" expose to light.
But now the diffuse material is still not generated, I end up with a Metal Material fully reflective.
To use the factor of 0.5, you might do something like:
vec3 color(ray r, hitableList list)
{
hitRecord rec;
vec3 unitDirection;
float t;
float factor = 1.f;
while(hit(r, 0.0, FLT_MAX, rec, list))
{
vec3 target = rec.p + rec.normal;
r = ray(rec.p, target-rec.p);
factor *= 0.5f;
}
unitDirection = normalize(direction(r));
t = 0.5* (unitDirection.y + 1.);
return factor * ((1.-t)*vec3(1.)+t*vec3(0.5,0.7,1.));
}

Cascaded Shadow Mapping lookup decision / gl_FragCoord.z

I've implemented Cascaded Shadow Mapping as in the nvidia SDK (http://developer.download.nvidia.com/SDK/10.5/Samples/cascaded_shadow_maps.zip). However my lookup just doesn't seem to work.
Here's a picture depicting my current state: http://i.imgur.com/SCHDO.png
The problem is, I end up in the first split right away eventhough I'm far away from it. As you can see, the other splits aren't even considered.
I thought the reason for this might come from a different projection matrix the main engine is using. It's different from the one I supply to the algorithm but I also tried passing the same matrix to the shader and compute this way: gl_Position = matProj * gl_ModelViewMatrix * gl_Vertex
That really didn't change a thing though. I still ended up with only one split.
Here are my shaders:
[vertex]
varying vec3 vecLight;
varying vec3 vecEye;
varying vec3 vecNormal;
varying vec4 vecPos;
varying vec4 fragCoord;
void main(void)
{
vecPos = gl_Vertex;
vecNormal = normalize(gl_NormalMatrix * gl_Normal);
vecLight = normalize(gl_LightSource[0].position.xyz);
vecEye = normalize(-vecPos.xyz);
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = ftransform();
}
[fragment] (just the shadow part)
vec4 getShadow()
{
vec4 sm_coord_c = texmat_3*vecPos;
float shadow = texture2D(smap_3, sm_coord_c.xy).x;
float s = (shadow < sm_coord_c.z) ? 0.0 : 1.0;
vec4 shadow_c = vec4(1.0, 1.0, 1.0, 1.0) * s;
if(gl_FragCoord.z < vecFarbound.x)
{
vec4 sm_coord_c = texmat_0*vecPos;
float shadow = texture2D(smap_0, sm_coord_c.xy).x;
float s = (shadow < sm_coord_c.z) ? 0.0 : 1.0;
shadow_c = vec4(0.7, 0.7, 1.0, 1.0) * s;
}
else if(gl_FragCoord.z < vecFarbound.y)
{
vec4 sm_coord_c = texmat_1*vecPos;
float shadow = texture2D(smap_1, sm_coord_c.xy).x;
float s = (shadow < sm_coord_c.z) ? 0.0 : 1.0;
shadow_c = vec4(0.7, 1.0, 0.7, 1.0) * s;
}
else if(gl_FragCoord.z < vecFarbound.z)
{
vec4 sm_coord_c = texmat_2*vecPos;
float shadow = texture2D(smap_2, sm_coord_c.xy).x;
float s = (shadow < sm_coord_c.z) ? 0.0 : 1.0;
shadow_c = vec4(1.0, 0.7, 0.7, 1.0) * s;
}
return shadow_c;
}
So for some reason, gl_FragCoord.z is smaller than vecFarbound.x no matter where in the scene I'm at. (Also notice the shadowed area to the far left, this one increases the higher I move the camera and soon takes over all the scene..)
I've checked the vecFarbound values and they're similar to the ones in nvidia's code so I assume I calculated them right.
Is there a way to check gl_FragCoord.z's value?
in my old csm implementation I simply used distance in camera space
float tempDist = 0.0;
tempDist = dot(EyePos.xyz, EyePos.xyz);
if (tempDist < split.x) ...
else if (tempDist < split.y) ...
...
Thsi solution was a bit simplier for me to understand and I got better control over the splits. When you use Z value (in clip space) there might be some problems that comes from z value being not-linear.
I suggest doing split tests in viewSpace, and then (if it works) use gl_FragCoord.z..