Using integers in GLSL fragment shaders - experiencing unexplained artifacts - opengl

I have a hexagonal grid that I want to texture. I want to use a single texture with 16 distinct subtextures arranged in a 4x4 grid. Each "node" in the grid has an image type, and I want to smoothly blend between them. My approach for implementing this is to render triangles in pairs, and encode the 4 image types on all vertices in the two faces, as well as a set of 4 weighting factors (which are the barycentric coordinates for the two tris). I can then use those two things to blend smoothly between any combination of image types.
Here is the fragment shader I'm using. The problems are arising from the use of int types, but I don't understand why. If i only use the first four sub-textures then i can change idx to be float and hardcode the Y-coord to be 0, and then it then works as i expect.
vec2 offset(int idx) {
vec2 v = vec2(idx % 4, idx / 4);
return v / 4.0;
}
void main(void) {
//
// divide the incoming UVs into one of 16 regions. The
// offset() function should take an integer from 0..15
// and return the offset to that region in the 4x4 map
//
vec2 uv = v_uv / 4.0;
//
// The four texture regions involved at
// this vertex are encoded in vec4 t_txt. The same
// values are stored at all vertices, so this doesn't
// vary across the triangle
//
int ia = int(v_txt.x);
int ib = int(v_txt.y);
int ic = int(v_txt.z);
int id = int(v_txt.w);
//
// Use those indices in the offset function to get the
// texture sample at that point
//
vec4 ca = texture2D(txt, uv + offset(ia));
vec4 cb = texture2D(txt, uv + offset(ib));
vec4 cc = texture2D(txt, uv + offset(ic));
vec4 cd = texture2D(txt, uv + offset(id));
//
// Merge them with the four factors stored in vec4 v_tfact.
// These vary for each vertex
//
fragcolour = ca * v_tfact.x
+ cb * v_tfact.y
+ cc * v_tfact.z
+ cd * v_tfact.w;
}
Here is what's happening:
(My "pair of triangles" are actually about 20 and you can see their structure in the artifacts, but the effect is the same)
This artifacting behaves a bit like z-fighting: moving the scene around makes it all shimmer and shift wildly.
Why doesn't this work as I expect?
One solution I can fall back on is to simply use a 1-dimensional texture map, with all 16 sub-images in a horizontal line, then i can switch everything to floating point since I won't need the modulo/integer-divide process to map idx->x,y, but this feels clumsy and I'd at least like to understand what's going on here.
Here is what it should look like, albeit with only 4 of the sub-images in use:

See OpenGL Shading Language 4.60 Specification - 5.4.1. Conversion and Scalar Constructors
When constructors are used to convert a floating-point type to an integer type, the fractional part of the floating-point value is dropped.
Hence int(v_txt.x) does not round v_txt.x, it truncates v_txt.x
You have to round the values to the nearest integer before constructing an integral value:
int ia = int(round(v_txt.x));
int ib = int(round(v_txt.y));
int ic = int(round(v_txt.z));
int id = int(round(v_txt.w));
Alternatively add 0.5 before constructing the integral value:
int ia = int(v_txt.x + 0.5);
int ib = int(v_txt.y + 0.5);
int ic = int(v_txt.z + 0.5);
int id = int(v_txt.w + 0.5);

Related

Attempt at making a Displacement Filter

So, basically, I'm trying to make a OBS Filter that displaces the pixels based on a lightmap/luminance map. I decided to learn how to make a filter by following this tutorial. But, in this tutorial, they don't explain much in terms of pixel displacement. So, I made a function that basically gets the brightness value of a texture I input and tested it by changing the pixel's alpha value with the red value of the texture:
float4 get_displacement(float2 position)
{
float2 pattern_uv = position / pattern_size;
float4 pattern_sample = pattern_texture.Sample(linear_wrap, pattern_uv / scale);
return pattern_sample;
}
float4 pixel_shader(pixel_data pixel) : TARGET
{
float4 source_sample = image.Sample(linear_wrap, pixel.uv);
if (pattern_size.x <= 0){
return source_sample;
}
float2 position = pixel.uv * float2(width, height);
float4 lightmap = get_displacement(position);
return float4(source_sample.rgb, lightmap.r);
return source_sample;
}
Which results to this (Note: The green is from a colour source that's behind the image to show the alpha value)
But, for some reason, when I try it with the vertex_shader, the function that decides where the pixel is rendered at, it seems to not work:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
if (pattern_size.x <= 0){
pixel.pos = mul(float4(vertex.pos.xyz, 1.0), ViewProj);
return pixel;
}
float2 position = vertex.uv * float2(width, height);
float4 lightmap = get_displacement(position);
pixel.pos = mul(float4(vertex.pos.x + (lightmap.r * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
(Note: testRamp1 is used as a value that I can change from a slider inside of OBS via some filter Properties)
The result that I'm expecting is something similar to this
To see if the issue was from me changing the XY position, I tested it using this function:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + 100, vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
And it gave me an expected result.
I also changed the 100 with the testRamp1 value, and it works just the same based on the value of the slider.
So, I then tested if it was from the pixels needing to all move the same distance as each other. So, I change the function to this:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + (vertex.uv.x * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
Which then gives me either a squashed image when testRamp1 is set to a negative value, and it gives me a stretched image when it's set it to a positive value.
But as soon as I try to get the value of an image, may it be the pattern or from the source image, it no longer works(not even the filter parameters appear). For example, I used, this function to use the values of the source image:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
float4 source_sample = image.Sample(linear_wrap, vertex.uv);
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + (source_sample.r * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
At this point, I'm at a loss of words as to what could be causing this issue
First of all, the vertex shader is not what you want to use for this kind of effect. What you actually want to do, is to sample the image in the pixel shader, but offset the UV values slightly by your displacement before you pass them to the Sample function.
The primary reason, why you don't want to do this in the vertex shader, is, that the number of vertices is usually much smaller than the number of pixels - in the worst case, you only have 4 vertices in total (one for each corner of your screen), so the granuality of things you can do in the vertex shader is rather coarse. (Note: I'm not too familiar with OBS filters, and don't know how many vertices OBS dispatches, but certainly much less than the number of pixels you have on your screen).
Now, the reason why your vertex shader didn't work at all, is a bit more technical. In short, you can't use Sample in a vertex shader, you'd have to use SampleLevel or SampleGrad instead (note that these functions require more parameters). This is because Sample automatically calculates a UV gradient between adjacent pixels, to figure out the level of detail that is needed for your texture (whether or not it actually has multiple levels of detail). But the vertex shader operates on vertices, not on pixels, so the concept of an "adjacent pixel" doesn't make sense in a vertex shader - thus, the Sample method doesn't work.

Stuck trying to optimize complex GLSL fragment shader

So first off, let me say that while the code works perfectly well from a visual point of view, it runs into very steep performance issues that get progressively worse as you add more lights. In its current form it's good as a proof of concept, or a tech demo, but is otherwise unusable.
Long story short, I'm writing a RimWorld-style game with real-time top-down 2D lighting. The way I implemented rendering is with a 3 layered technique as follows:
First I render occlusions to a single-channel R8 occlusion texture mapped to a framebuffer. This part is lightning fast and doesn't slow down with more lights, so it's not part of the problem:
Then I invoke my lighting shader by drawing a huge rectangle over my lightmap texture mapped to another framebuffer. The light data is stored in an array in an UBO and it uses the occlusion mapping in its calculations. This is where the slowdown happens:
And lastly, the lightmap texture is multiplied and added to the regular world renderer, this also isn't affected by the number of lights, so it's not part of the problem:
The problem is thus in the lightmap shader. The first iteration had many branches which froze my graphics driver right away when I first tried it, but after removing most of them I get a solid 144 fps at 1440p with 3 lights, and ~58 fps at 1440p with 20 lights. An improvement, but it scales very poorly. The shader code is as follows, with additional annotations:
#version 460 core
// per-light data
struct Light
{
vec4 location;
vec4 rangeAndstartColor;
};
const int MaxLightsCount = 16; // I've also tried 8 and 32, there was no real difference
layout(std140) uniform ubo_lights
{
Light lights[MaxLightsCount];
};
uniform sampler2D occlusionSampler; // the occlusion texture sampler
in vec2 fs_tex0; // the uv position in the large rectangle
in vec2 fs_window_size; // the window size to transform world coords to view coords and back
out vec4 color;
void main()
{
vec3 resultColor = vec3(0.0);
const vec2 size = fs_window_size;
const vec2 pos = (size - vec2(1.0)) * fs_tex0;
// process every light individually and add the resulting colors together
// this should be branchless, is there any way to check?
for(int idx = 0; idx < MaxLightsCount; ++idx)
{
const float range = lights[idx].rangeAndstartColor.x;
const vec2 lightPosition = lights[idx].location.xy;
const float dist = length(lightPosition - pos); // distance from current fragment to current light
// early abort, the next part is expensive
// this branch HAS to be important, right? otherwise it will check crazy long lines against occlusions
if(dist > range)
continue;
const vec3 startColor = lights[idx].rangeAndstartColor.yzw;
// walk between pos and lightPosition to find occlusions
// standard line DDA algorithm
vec2 tempPos = pos;
int lineSteps = int(ceil(abs(lightPosition.x - pos.x) > abs(lightPosition.y - pos.y) ? abs(lightPosition.x - pos.x) : abs(lightPosition.y - pos.y)));
const vec2 lineInc = (lightPosition - pos) / lineSteps;
// can I get rid of this loop somehow? I need to check each position between
// my fragment and the light position for occlusions, and this is the best I
// came up with
float lightStrength = 1.0;
while(lineSteps --> 0)
{
const vec2 nextPos = tempPos + lineInc;
const vec2 occlusionSamplerUV = tempPos / size;
lightStrength *= 1.0 - texture(occlusionSampler, vec2(occlusionSamplerUV.x, 1 - occlusionSamplerUV.y)).x;
tempPos = nextPos;
}
// the contribution of this light to the fragment color is based on
// its square distance from the light, and the occlusions between them
// implemented as multiplications
const float strength = max(0, range - dist) / range * lightStrength;
resultColor += startColor * strength * strength;
}
color = vec4(resultColor, 1.0);
}
I call this shader as many times as I need, since the results are additive. It works with large batches of lights or one by one. Performance-wise, I didn't notice any real change trying different batch numbers, which is perhaps a bit odd.
So my question is, is there a better way to look up for any (boolean) occlusions between my fragment position and light position in the occlusion texture, without iterating through every pixel by hand? Could render buffers perhaps help here (from what I've read they're for reading data back to system memory, I need it in another shader though)?
And perhaps, is there a better algorithm for what I'm doing here?
I can think of a couple routes for optimization:
Exact: apply a distance transform on the occlusion map: this will give you the distance to the nearest occluder at each pixel. After that you can safely step by that distance within the loop, instead of doing baby steps. This will drastically reduce the number of steps in open regions.
There is a very simple CPU-side algorithm to compute a DT, and it may suit you if your occluders are static. If your scene changes every frame, however, you'll need to search the literature for GPU side algorithms, which seem to be more complicated.
Inexact: resort to soft shadows -- it might be a compromise you are willing to make, and even seen as an artistic choice. If you are OK with that, you can create a mipmap from your occlusion map, and then progressively increase the step and sample lower levels as you go farther from the point you are shading.
You can go further and build an emitters map (into the same 4-channel map as the occlusion). Then your entire shading pass will be independent of the number of lights. This is an equivalent of voxel cone tracing GI applied to 2D.

OpenGL Terrain System, small height difference between GPU and CPU

A quick summary:
I've a simple Quad tree based terrain rendering system that builds terrain patches which then sample a heightmap in the vertex shader to determine the height of each vertex.
The exact same calculation is done on the CPU for object placement and co.
Super straightforward, but now after adding some systems to procedurally place objects I've discovered that they seem to be misplaced by just a small amount. To debug this I render a few crosses as single models over the terrain. The crosses (red, green, blue lines) represent the height read from the CPU. While the terrain mesh uses a shader to translate the vertices.
(I've also added a simple odd/even gap over each height value to rule out a simple offset issue. So those ugly cliffs are expected, the submerged crosses are the issue)
I'm explicitly using GL_NEAREST to be able to display the "raw" height value:
As you can see the crosses are sometimes submerged under the terrain instead of representing its exact height.
The heightmap is just a simple array of floats on the CPU and on the GPU.
How the data is stored
A simple vector<float> which is uploaded into a GL_RGB32F GL_FLOAT buffer. The floats are not normalized and my terrain usually contains values between -100 and 500.
How is the data accessed in the shader
I've tried a few things to rule out errors, the inital:
vec2 terrain_heightmap_uv(vec2 position, Heightmap heightmap)
{
return (position + heightmap.world_offset) / heightmap.size;
}
float terrain_read_height(vec2 position, Heightmap heightmap)
{
return textureLod(heightmap.heightmap, terrain_heightmap_uv(position, heightmap), 0).r;
}
Basics of the vertex shader (the full shader code is very long, so I've extracted the part that actually reads the height):
void main()
{
vec4 world_position = a_model * vec4(a_position, 1.0);
vec4 final_position = world_position;
// snap vertex to grid
final_position.x = floor(world_position.x / a_quad_grid) * a_quad_grid;
final_position.z = floor(world_position.z / a_quad_grid) * a_quad_grid;
final_position.y = terrain_read_height(final_position.xz, heightmap);
gl_Position = projection * view * final_position;
}
To ensure the slightly different way the position is determined I tested it using hardcoded values that are identical to how C++ reads the height:
return texelFetch(heightmap.heightmap, ivec2((position / 8) + vec2(1024, 1024)), 0).r;
Which gives the exact same result...
How is the data accessed in the application
In C++ the height is read like this:
inline float get_local_height_safe(uint32_t x, uint32_t y)
{
// this macro simply clips x and y to the heightmap bounds
// it does not interfer with the result
BB_TERRAIN_HEIGHTMAP_BOUND_XY_TO_SAFE;
uint32_t i = (y * _size1d) + x;
return buffer->data[i];
}
inline float get_height_raw(glm::vec2 position)
{
position = position + world_offset;
uint32_t x = static_cast<int>(position.x);
uint32_t y = static_cast<int>(position.y);
return get_local_height_safe(x, y);
}
float BB::Terrain::get_height(const glm::vec3 position)
{
return heightmap->get_height_raw({position.x / heightmap_unit_scale, position.z / heightmap_unit_scale});
}
What have I tried:
Comparing the Buffers
I've dumped the first few hundred values from the vector. And compared it with the floating point buffer uploaded to the GPU using Nvidia Nsight, they are equal, rounding/precision errors there.
Sampling method
I've tried texture, textureLod and texelFetch to rule out some issue there, they all give me the same result.
Rounding
The super strange thing, when I round all the height values. They are perfectly aligned which just screams floating point precision issues.
Position snapping
I've tried rounding, flooring and ceiling the position, to ensure the position always maps to the same texel. I also tried adding an epsilon offset to rule out a positional precision error (probably stupid because the terrain is stable...)
Heightmap sizes
I've tried various heightmaps, also of different sizes.
Heightmap patterns
I've created a heightmap containing a pattern to ensure the position is not just offsetet.

GLSL: calculating normals after tesselation

I am having problems calculating normals after tesselation.
Currently I have code which samples height map and calculates normal from that:
float HEIGHT = 2048.0f;
float WIDTH =2048.0f;
float SCALE =displace_ratio;
vec2 uv = tex_coord_FS_in.xy;
vec2 du = vec2(1/WIDTH, 0);
vec2 dv= vec2(0, 1/HEIGHT);
float dhdu = SCALE/(2/WIDTH) * (texture(height_tex, uv+du).r - texture(height_tex, uv-du).r);
float dhdv = SCALE/(2/HEIGHT) * (texture(height_tex, uv+dv).r - texture(height_tex, uv-dv).r);
N = normalize(N+T*dhdu+B*dhdv);
But doesn't look ok with low level tesselations
How can I get rid of this ?
Only way to get rid of this is to use a normal map in combination with the computed normals. The normals you see on the right are correct. They're just in low resolution, because you tesselate them so. Use a normal map and per-pixel lighting to highlight the intricate details.
Also, one thing to consider is the topology of your initial mesh. More evenly spaced polygons result in more evenly spaced tesselation.
Additionally, you might want to do, instead of:
float dhdu = SCALE/(2/WIDTH) * (texture(height_tex, uv+du).r - texture(height_tex, uv-du).r);
float dhdv = SCALE/(2/HEIGHT) * (texture(height_tex, uv+dv).r - texture(height_tex, uv-dv).r);
sample a few more points from the heightmap, and average them to extract a more averaged version of the normal at each point.

Packing and unpacking a uint into float4 in DirectX

I have a texture atlas that I'm generating from an array of uints. Sampling from it in my pixel shader, colors are coming out correctly. Here's the relevant HLSL:
Texture2D textureAtlas : register(t8);
SamplerState smoothSampler : register(s9)
{
Filter = MIN_MAG_MIP_LINEAR;
AddressU = Clamp;
AddressV = Clamp;
}
struct PS_OUTPUT
{
float4 Color : SV_TARGET0;
float Depth : SV_DEPTH0;
}
PS_OUTPUT PixelShader
{
// among other things, u and v are calculated here
output.Color = textureAtlas.Sample(smoothSampler, float2(u,v));
}
This works great. With color working, I've extended the texture atlas to include depth information as well. There are only a few thousand depth values that I want, well under 24 bits worth (my depth buffer is 24 bits wide + an 8 bit stencil). The input depth values are uints, just like the colors, though of course in the depth case the values are going to be spread over four color channels and in the shader I want a single float between 0 and 1, so that will need to be computed from the sample. Here's the additional pixel shader code:
// u and v are recalculated for the depth portion of the texture atlas
float4 depthSample = textureAtlas.Sample(smoothSampler, float2(u,v));
float depthValue =
(depthSample.b * 65536.0 +
depthSample.g * 256.0 +
depthSample.r)
/ 65793.003921568627450980392156863;
output.Depth = depthValue;
The long constant here is 16777216/255, which should map the full uint range down to a unorm.
Now, when I'm generating the texture, if I constrain the depth values to the range of 0..2048, the output depth is correct. However, if I allow the upper limit of the range to increase (even if it's simply by taking the input values and performing a left shift by 16), then the output depths will be slightly off. Not by much, just +/- 0.002, but it's enough to make the output look terrible.
Can anybody spot my bug here? Or, more generally, is there a better way of packing and unpacking uints into textures?
I'm working in shader model 4 level 9_3 and C++ 11.
Your code is prone to precision loss: you're adding a relatively large number up to (65536+256) and a small number depthSample.r < 1.
Also, make sure your (u,v) are in the center of the texel to avoid filtering or replace Sample with Load.
Since you're using SM4 you can use the functions asuint and asfloat to reinterpret cast.
You can also use float format textures instead of R8G8B8A8.