Strange kernel artifacts when writing to fragment out often

Strange kernel artifacts when writing to fragment out often - opengl

I'm developing a game engine and working on a deferred rendering pipeline. After finishing the (second pass) (shading) shader, I started testing the pipeline on various other computers I have. Interestingly, on my older laptop I get this strange artifacting on each 4x8 pixel group (example below). It looks like the shader is executing and ultimately returning the correct color, but in a very random fashion.
This question is not a bug report or a solution request. I have fixed this issue with the below code patch. This thread is rather to gather a better understanding as why this happens, and to provide insights for anyone else that may be affected by the same issue.
To describe the effect in more detail:
About 50% of the screen has a 4x8 group of pixels that highly tint the actual resulting color.
These 4x8 groups in random places on the screen each frame, causing a "static" effect.
Certain models tint different colours. As you can see below, the reflective bunny is tinted blue however the refractive spheres are tinted yellow. This doesn't seem to be a Gbuffer issue however as they both sample from the same texture which I'm sure is correct (as I can see it on screen at the same time).
Different object's 4x8 blocks have a higher rate of showing the correct color. You can see the refractive bunny is mostly correct, but the reflective floor and refractive spheres are simply white and yellow.
The tint colors of the 4x8 blocks change wildly depending on what other programs are running on the GPU.
The image should look like this:
Pseudocode of the broken shader was something like
out vec3 FragColor; // Out pixel of fragment shader
void main() {
for (int i=0; i<NumberOfPointLights; i++) {
... Lighting calculation code...
FragColor += lighting;
}
for (int i=0; i<NumberOfSpotLights; i++) {
... Lighting calculation code...
FragColor += lighting;
}
for (int i=0; i<NumberOfDirectionalLights; i++) {
... Lighting calculation code...
FragColor += lighting;
}
}
To fix the issue, I simply initialised a temporary variable to hold the output color, wrote to that during lighting calculations and then wrote that to the fragment output at the end. As follows:
out vec3 FragColor; // Out pixel of fragment shader
void main() {
vec3 outcolor = vec3(0);
for (int i=0; i<NumberOfPointLights; i++) {
... Lighting calculation code...
outcolor += lighting;
}
for (int i=0; i<NumberOfSpotLights; i++) {
... Lighting calculation code...
outcolor += lighting;
}
for (int i=0; i<NumberOfDirectionalLights; i++) {
... Lighting calculation code...
outcolor += lighting;
}
FragColor = outcolor;
}
I was surprised this worked as I assumed that this behaviour is assumed by default. That the writing to the fragment output doesn't actually write to VRAM each time, only at the end. I was under the impression that a fragment output variable is read from after shader execution, hence why its global.
Suspicions and Questions
From my research, I read that the 4x8 pixel groups is the size of one "work-group" or "core" on an nVidia GPU (which I am using) while AMD use 8x8 pixel work-groups. So Something is causing random work-groups' output color to be permanently affected until it is reassigned to a different location on screen.
The fact that the colours change depending on what else is using the GPU tells me that either the GPU has a very complicated memory allocation scheme and it's reading from other programs' memory (which I doubt) or that the shader is getting uninitialised memory every frame. But surely the same memory for the texture is written over each time?
Writing to the fragment out variable writes to VRAM each time, writing to it too many times per work-group causes the work-group to bail leaving mixed results behind. This would explain why a temporary/local variable works.
As I am always using += (read then write) The temporary variable initialisation acts as a explicit instruction to start with the color black and add to it, while writing to the fragment out directly adds to the last color of that pixel. If this where the case though, why did it work correctly on a higher-end PC?
Other details
My old laptop is using a GT540m with Optimus technology with integrated Intel 3000 graphics (which isn't being used here)
My newer desktop PC is using a GTX1070.
Both GPUs use very little VRAM during running the application, less than 100MB.
Shader is being compiled using #version 400 core

This is a driver bug. There's nothing more to look into here.
Output variables can be read from and written to. This is part of GLSL. So it seems the driver just screwed up that implementation of it.

Related

Read write an image with the GL_ARB_fragment_shader_interlock extension

I am testing whether the GL_ARB_fragment_shader_interlock extension can execute the critical section code of the same pixel position in the order of instance rendering. I used instanced rendering to draw five translucent planes (instances in order from farthest to nearer) and the result is the same as the fixed pipeline blending result (the blending result is random without this extension). But one problem is that there will be an extra line in the middle of each plane (not when using fixed pipeline blending). I found that the adjacent sides of the triangle generating the fragments twice. But rasterization should ensure that adjacent triangles do not have overlapping pixels. How to solve this please? I don't know where I went wrong, here is the code and result, please enlighten me!
The GLSL code:
#version 450 core
#extension GL_ARB_fragment_shader_interlock : require
//out vec4 Color_;
in vec4 v2f_Color;
layout(binding = 0, rgba16f) uniform image2D uColorTex;
void main()
{
beginInvocationInterlockARB();
vec4 color = imageLoad(uColorTex, ivec2(gl_FragCoord.xy));
color = (1 - v2f_Color.a) * color + v2f_Color * v2f_Color.a;
imageStore(uColorTex, ivec2(gl_FragCoord.xy), color);
endInvocationInterlockARB();
//Color_ = v2f_Color;
}
Tthe result using extension to read write an image manually:
The result using fixed pipeline blending:

I am a bit suspicious of the ivec2(gl_FragCoord.xy). This is doing a conversion from floating-point to an integer, and it could be the case that it gets rounded in different directions between the different triangles. This might explain not only the overlap, but why there's a missing top-left pixel in one of the squares.
Judging by the spec, ivec2(gl_FragCoord.xy) should be equivalent to ivec2(trunc(gl_FragCoord.xy)), which really ought to be consistent, but maybe the implementation is bad...
You might want to try:
ivec2(round(gl_FragCoord.xy))
ivec2(round(gl_FragCoord.xy + 0.5))

How to get the verticies moving independently in OpenGL / PyOpenGL without instancing, TF etc

I'm building a particle sim using OpenGL and am trying to benchmark the various methods of setting up the particles. Before I progress to instancing or transform feedback, I wanted to create independently moving points using only a standard VBO.
So far I have tried moving points by updating the buffer data using glBufferSubData, filling up an empty buffer. However, the frame rate drops as soon as you get close to 10,000 points. I need at least a few million for my purpose eventually.
At this minute I'm now experimenting with transformation matrices and have the following:
In my shader:
in layout(location = 0) vec3 positions;
uniform mat4 transform[10];
void main(){
int count = 0;
count + 1;
gl_Position = transform[count] * vec4(positions, 1.0);
}
and in my program draw loop:
self.matrices[i] = Matrix44.from_translation(Vector3([0.0,-self.t,0.0]))
glUniformMatrix4fv(self.transformMatrixNameloc,10,GL_FALSE,self.matrices[i])
Although this seems to be working, I cannot get the time (self.t) to reset / restart each time for each new particle drawn. I'm using this variable for testing by the way, in place of a gravity force for now.
Any ideas?
Many thanks,

How to represent status of light sources in OpenGL

So I want to have multiple light sources in my scene. The basic idea is to simply have an array of a (uniform) struct that has all the properties of light you care about such as positions, color, direction, cutoff and w/e you want. My problem is how to represent which lights are on/off? I will list out all the ways I can think of. Pl
Have a uniform int per light structure to indicate if it's on/off.
Have the number of light struct match multiples of 2, 3, or 4 such that I can use that many bool vectors to indicate their status. For example, 16 lights = 4x4 bvec4.
Instead of using many flags and branch, always go through every single light but with the off ones set to (0,0,0,0) for color
I'm leaning towards the last options as it won't have branching ... but I already read that modern graphics card are more okay with branching now.

None of your ideas is really good because all of them require the shader to evaluate which lightsources are active and which aren't. You should also differentiate between scene information (which lights are present in the scene) and data necessary for rendering (which lights are on and illuminate the scene). Scene information shouldn't be stored in a shader since it is unnecessary and will only slow down the shader.
A better way for handling multiple light sources in a scene and render with only the active ones could be as follows:
In every frame:
Evaluate on CPU side which lights are on. Pass only the lights which are on to the shader uniforms together with the total count of the active lights. In pseudocode:
Shader:
uniform lightsources active_lightsources[MAX_LIGHTS];
uniform int light_count;
CPU:
i = 0
foreach (light in lightsources)
{
if (light.state == ON)
{
set_uniform("active_lightsources[i]", light);
i++
}
}
set_uniform("light_count", i);
When illuminating a fragment do the following in the shader
Shader:
for (int i = 0; i < light_count; i++)
{
//Calculate illumination from active_lightsources[i]
}
The major advantage of this approach is that you store less lights inside the shader and that the loop inside the shader only loops over lightsources that are relevant for the current frame. The evaluation which lights are relevant is done once per frame on the CPU instead of once per vertex/fragment in the shader.

Why is a simple shader slower than the standard pipeline?

I want to write a very simple shader which is equivalent to (or faster) than the standard pipeline. However, even the simplest shader possible:
Vertex Shader
void main(void)
{
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = ftransform();
}
Fragment Shader
uniform sampler2D Texture0;
void main(void)
{
gl_FragColor = texture2D(Texture0, gl_TexCoord[0].xy);
}
Cuts my framerate half in my game, compared to the standard shader, and performs horrific if some transparent images are displayed. I don't understand this, because the standard shader (glUseProgram(0)) does lighting and alpha blending, while this shader only draws flat textures. What makes it so slow?

It looks like this massive slowdown of custom shaders is a problem with old Intel Graphics chips, which seem to emulate the shaders on the CPU.
I tested the same program on recent hardware and the frame drop with the custom shader activated is only about 2-3 percents.
EDIT: wrong theory. See new answer below

I think you might bump into overdraw.
I don't know what engine you are using your shader on, but if you have alpha blend on then you might end up overdrawing allot.
Think about it this way :
If you have a 800x600 screen, and a 2D quad over the whole screen, that 2D quad will have 480000 fragment shader calls, although it has only 4 vertexes.
Now, moving further, let's assume you have 10 such quads, on on top of another. If you don't sort your geometry Front to Back or if you are using alpha blend with no depth test, then you will end up with 10x800x600 = 4800000 fragment calls.
2D usually is quite expensive on OpenGL due to the overdraw. 3D rejects many fragments. Eventhou the shaders are more complicated, the number of calls are greatly reduced for 3D objects compared to 2D objects.

After long investigation, the slowdown of the simple shader was caused by the shader being too simple.
In my case, the slowdown was caused by the text rendering engine, which made heavy use of "glBitmap", which would be very slow with textures enabled (for whatever reason I cannot understand; these letters are tiny).
However, this did not affect the standard pipeline, as it would acknowledge the feature glDisable(GL_LIGHTING) and glDisable(GL_TEXTURE_2D ), which circumvents the slowdown, whereas the simple shader failed to do so and would thus even do more work as the standard pipeline. After introducing these two features to the custom shader, it is as fast as the standard pipeline, plus the ability to add random effects without any performance impact!

ATI glsl point sprite problems

I've just moved my rendering code onto my laptop and am having issues with opengl and glsl.
I have a vertex shader like this (simplified):
uniform float tile_size;
void main(void) {
gl_PointSize = tile_size;
// gl_PointSize = 12;
}
and a fragment shader which uses gl_Pointcoord to read a texture and set the fragment colour.
In my c++ program I'm trying to bind tile_size as follows:
glEnable(GL_TEXTURE_2D);
glEnable(GL_POINT_SPRITE);
glEnable(GL_VERTEX_PROGRAM_POINT_SIZE);
GLint unif_tilesize = glGetUniformLocation(*shader program*, "tile_size");
glUniform1f(unif_tilesize, 12);
(Just to clarify I've already setup a program used glUseProgram, shown is just the snippet regarding this particular uniform)
Now setup like this I get one-pixel points and have discovered that opengl is failing to bind unif_tilesize (it gets set to -1).
If I swap the comments round in my vertex shader I get 12px point sprites fine.
Peculiarly the exact same code on my other computer works absolutely fine. The opengl version on my laptop is 2.1.8304 and it's running an ATI radeon x1200 (cf an nvidia 8800gt in my desktop) (if this is relevant...).
EDIT I've changed the question title to better reflect the problem.

You forgot to call glUseProgram before setting the uniform.

So after another day of playing around I've come to a point where, although I haven't solved my original problem of not being able to bind a uniform to gl_PointSize, I have modified my existing point sprite renderer to work on my ATI card (an old x1200) and thought I'd share some of the things I'd learned.
I think that something about gl_PointSize is broken (at least on my card); in the vertex shader I was able to get 8px point sprites using gl_PointSize=8.0;, but using gl_PointSize=tile_size; gave me 1px sprites whatever I tried to bind to the uniform tile_size.
Luckily I don't need different sized tiles for each vertex so I called glPointSize(tile_size) in my main.cpp instead and this worked fine.
In order to get gl_PointCoord to work (i.e. return values other than (0,0)) in my fragment shader, I had to call glTexEnvf( GL_POINT_SPRITE_ARB, GL_COORD_REPLACE_ARB, GL_TRUE ); in my main.cpp.
There persisted a ridiculous problem in which my varyings were beign messed up somewhere between my vertex and fragment shaders. After a long game of 'guess what to type into google to get relevant information', I found (and promptly lost) a forum where someone said that in come cases if you don't use gl_TexCoord[0] in at least one of your shaders, your varying will be corrupted.
In order to fix that I added a line at the end of my fragment shader:
_coord = gl_TexCoord[0].xy;
where _coord is an otherwise unused vec2. (note gl_Texcoord is not used anywhere else).
Without this line all my colours went blue and my texture lookup broke.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Strange kernel artifacts when writing to fragment out often - opengl

This is a driver bug. There's nothing more to look into here. Output variables can be read from and written to. This is part of GLSL. So it seems the driver just screwed up that implementation of it.

Related

Read write an image with the GL_ARB_fragment_shader_interlock extension

How to get the verticies moving independently in OpenGL / PyOpenGL without instancing, TF etc

How to represent status of light sources in OpenGL

Why is a simple shader slower than the standard pipeline?

ATI glsl point sprite problems

Categories

Resources