Precisely map World Position to UV Texture-Coordinates (OpenGL Compute Shader) - opengl

I need help to precisely sample from my 3D Texture in the OpenGL (4.5) Compute Shader given a world position (within the domain of the texture dimensions). More precisely, I need help with my uv() function which maps world coordinates to the exact corresponding texture coordinates.
I want linear interpolation of the data, so my current approach uses texture(). But this results in errors around 0.001 compared to the expected values.
However, if I use texelFetch() and mix() to manually mimick the linear interpolation of texture() as stated in the specification (p. 248), I can reduce the error to 0.0000001 (which is desired). You can see an example of how I implemented it below in the Code section.
This is the function which I currently use inside the Compute Shader to calculate my uv-coordinates:
vec3 uv(const vec3 position) {
return (position + 0.5) / textureSize(tex[0], 0);
}
Though this one is often suggested across the internet, my results are not perfectly aligned.
Example
To elaborate, I have floating point data stored in a Texture as GL_RGB32F. For simplicity my example here uses scalar GL_R32F. The data has dimensions of, e.g., 20x20x20 (but can be arbitrary). I operate in the data domain [0, 19]^3 and want to exactly map my current position to the texture domain [0, 1]^3 to index the data at this position.
I have a test texture which alternates between 0 and 1 on the x-axis and therefore should interpolate for vec3(2.2, 0, 0) to 0.2.
As stated above, I tested texture() and texelFetch() + mix(). My manual interpolation evaluates to 0.200000003 which is fine. But calling texture() evaluates to 0.199218750, a quite high error compared. Strangely, manual interpolation and automatic interpolation evaluate to the same (correct) value for integer positions and the mid between integer positions (e.g., for vec3(2.0, 0, 0), vec3(2.5, 0, 0) and vec3(3.0, 0, 0)).
A visual example with actual calculated values:
uv(x, y, z) = ((x, y, z) + 0.5) / (20, 20, 20)
19| 1 |
| |
..| uv ..|
| (2.2, 3.0) ===> | (0.135, 0.175)
1 | x | x
|___________ |___________
0 1 .. 19 0 1
Code
I use C++, OpenGL 4.5 and globjects as a wrapper for OpenGL. The texture buffers are created and configured as depicted below.
// Texture buffer creation
t = globjects::Texture::createDefault(gl::GLenum::GL_TEXTURE_3D);
t->setParameter(gl::GL_TEXTURE_WRAP_S, gl::GL_CLAMP_TO_EDGE);
t->setParameter(gl::GL_TEXTURE_WRAP_T, gl::GL_CLAMP_TO_EDGE);
t->setParameter(gl::GL_TEXTURE_WRAP_R, gl::GL_CLAMP_TO_EDGE);
t->setParameter(gl::GL_TEXTURE_MIN_FILTER, gl::GL_LINEAR);
t->setParameter(gl::GL_TEXTURE_MAG_FILTER, gl::GL_LINEAR);
The Compute Shader is invocated.
// datatex holds image information
t->image3D(0, gl::GL_RGB32F, datatex->dimensions, 0, gl::GL_RGB, gl::GL_FLOAT, (const uint8_t*) datatex->data());
// ... (Make texture resident)
gl::glDispatchCompute(1, 1, 1);
// ... (Make texture not resident)
The Compute Shader, summarized to the important parts, is as follows:
#version 450
#extension GL_ARB_bindless_texture : enable
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout(binding=0) uniform samplers
{
sampler3D tex[1];
};
vec3 uv(const vec3 position) {
return (position + 0.5) / textureSize(tex[0], 0);
}
void main() {
// Automatic interpolation
vec4 correct1 = texture(tex[0], uv(vec3(2.0,0,0), 0);
vec4 correct2 = texture(tex[0], uv(vec3(2.5,0,0), 0);
vec4 correct3 = texture(tex[0], uv(vec3(3.0,0,0), 0);
vec4 wrong = texture(tex[0], uv(vec3(2.1,0,0), 0);
// Manual interpolation on x-axis
vec3 pos = vec3(2.1,0,0);
vec4 v0 = texelFetch(tex[0], ivec3(floor(pos.x), pos.yz), 0);
vec4 v1 = texelFetch(tex[0], ivec3(ceil(pos.x), pos.yz), 0);
vec4 correct4 = mix(v0, v1, fract(pos.x));
}
I'd love your input, I'm at my end.. Thanks!
System
Also, I'm trying to achieve this on an NVIDIA GPU.

The texture units of GPUs are only needed to sample with 8bit precision in the fraction as of the D3D11 specs. This explains the small error which does not happen on (normalized) integer or mid-integer coordinates.
The fractional precision can also be queried in Vulkan via subTexelPrecisionBits and the online Vulkan database shows that there is no GPU as of today which offers more than 8 bits of precision in the fraction during sampling.
Performing linear interpolation in the shader itself offers the full float32 precision.

Related

What algorithm does GL_LINEAR use exactly?

The refpages say "Returns the weighted average of the four texture elements that are closest to the specified texture coordinates." How exactly are they weighted? And what about 3D textures, does it still only use 4 texels for interpolation or more?
in 2D textures are 4 samples used which means bi-linear interpolation so 3x linear interpolation. The weight is the normalized distance of target texel to its 4 neighbors.
So for example you want the texel at
(s,t)=(0.21,0.32)
but the texture nearby texels has coordinates:
(s0,t0)=(0.20,0.30)
(s0,t1)=(0.20,0.35)
(s1,t0)=(0.25,0.30)
(s1,t1)=(0.25,0.35)
the weights are:
ws = (s-s0)/(s1-s0) = 0.2
wt = (t-t0)/(t1-t0) = 0.4
so linear interpolate textels at s direction
c0 = texture(s0,t0) + (texture(s1,t0)-texture(s0,t0))*ws
c1 = texture(s0,t1) + (texture(s1,t1)-texture(s0,t1))*ws
and finally in t direction:
c = c0 + (c1-c0)*wt
where texture(s,t) returns texel color at s,t while the coordinate corresponds to exact texel and c is the final interpolated texel color.
In reality the s,t coordinates are multiplied by the texture resolution (xs,ys) which converts them to texel units. after that s-s0 and t-t0 is already normalized so no need to divide by s1-s0 and t1-t0 as they are booth equal to one. so:
s=s*xs; s0=floor(s); s1=s0+1; ws=s-s0;
t=t*ys; t0=floor(t); t1=t0+1; wt=t-t0;
c0 = texture(s0,t0) + (texture(s1,t0)-texture(s0,t0))*ws;
c1 = texture(s0,t1) + (texture(s1,t1)-texture(s0,t1))*ws;
c = c0 + (c1-c0)*wt;
I never used 3D textures before but in such case it use 8 textels and it is called tri-linear interpolation which is 2x bi-linear interpolation simply take 2 nearest textures and compute each with bi-linear interpolation and the just compute the final texel by linear interpolation based on the u coordinate in the exact same way ... so
u=u*zs; u0=floor(u); u1=u0+1; wu=u-u0;
c = cu0 + (cu1-cu0)*wu;
where zs is count of textures, cu0 is result of bi-linear interpolation in texture at u0 and cu1 at u1. This same principle is used also for mipmaps...
All the coordinates may have been offseted by 0.5 texel and also the resolution multiplication can be done with xs-1 instead of xs based on your clamp settings ...
As well as the bilinear interpolation outlined in Spektre's answer, you should be aware of the precision of GL_LINEAR interpolation. Many GPUs (e.g. Nvidia, AMD) do the interpolation using fixed point arithmetic with only ~255 distinct values between the R,G,B,A values in the texture.
For example, here is pseudo code showing how GPUs might do the interpolation:
float interpolate_red(float red0, float red1, float f) {
int g = (int)(f*256)
return (red0*(256-g) + red1*g)/256;
}
If your texture is for coloring and contains GL_UNSIGNED_BYTE values then it is probably OK for you. But if your texture is a lookup table for some other calculation and it contains GL_UNSIGNED_SHORT or GL_FLOAT values then this loss of precision could be a problem for you. In which case you should make your lookup table bigger with in-between values calculated with (float) or (double) precision.

OpenGL: issues with converting floats from texture to integers in fragment shader

I render to a texture which is in the format GL_RGBA8.
When I render to this texture I have a fragment shader whose output is set to color = (1/255, 0, 0, 1). Triangles are overlapping each other and I set the blend mode to (GL_ONE, GL_ONE) so for example if 2 triangles overlap for a given fragment, the resulting pixel at that fragment position will have value (2/255.0).
I then use this texture in a second pass (applied to a quad filling up the screen). My goal at this point when I read the values back from the texture is to convert the values (which are in floating point format in the range [0:1]) back to integers in the range [0:255]. If I look at the pixel that add value (2.0/255.0) I should have the result (2.0/255.0) * 255.0 = 2.0 but I don't.
If I do
float a = (texture(colorTexture, texCoord).x * 255);
float b = (a == 2) ? 1.0 : 0;
color = vec4(0, b, 0, 1);
I get a black image. If I do
float a = (texture(colorTexture, texCoord).x * 255);
float b = (a > 1.999 && a <= 2) ? 1.0 : 0;
color = vec4(0, b, 0, 1);
I get the expected result. So in summary it seems like the convention back to [0:255] suffers from floating precision issues.
precision highp float;
Doesn't make a difference. I also turned filtering off (and no mipmaps).
This would work:
float a = ceil(texture(colorTexture, texCoord).x * 255);
Though in general that doesn't like very robust as a solution (why would ceil work and not floor for example, why is the value 1.999999 rather than 2.00001 and can I be sure it will always be that way?). People must have done that before so I am sure there's a much better way to guaranteeing you get an accurate result without doing too much fiddling with the numbers. Any hints would be greatly appreciated.
EDIT
As pointed in 2 comments, it's right from the way floating point numbers are encoded that you can't get a guarantee that you will get a "integer" number back even if the number is even (that's good to be reminded of this important point). So I reformulate my question which is then, is there a preferred way in GLSL to clamp number to its closest integer values?
And that would be round:
float a = round(texture(colorTexture, texCoord).x * 255);
Hope this can help other people in the future though.

DirectX Converting Pixel World Position to Shadow Map Position Gives Weird, Tiled Results

I've been trying for some time now to get a screen-space pixel (provided by a deferred HLSL shader) to convert to light space. The results have been surprising to me as my light rendering seems to be tiling the depth buffer.
Importantly, the scene camera (or eye) and the light being rendered from start in the same position.
First, I extract the world position of the pixel using the code below:
float3 eye = Eye;
float4 position = {
IN.texCoord.x * 2 - 1,
(1 - IN.texCoord.y) * 2 - 1,
zbuffer.r,
1
};
float4 hposition = mul(position, EyeViewProjectionInverse);
position = float4(hposition.xyz / hposition.w, hposition.w);
float3 eyeDirection = normalize(eye - position.xyz);
The result seems to be correct as rendering the XYZ position as RGB respectively yields this (apparently correct) result:
The red component seems to be correctly outputting X as it moves to the right, and blue shows Z moving forward. The Y factor also looks correct as the ground is slightly below the Y axis.
Next (and to be sure I'm not going crazy), I decided to output the original depth buffer. Normally I keep the depth buffer in a Texture2D called DepthMap passed to the shader as input. In this case, however, I try to undo the pixel transformation by offsetting it back into the proper position and multiplying it by the eye's view-projection matrix:
float4 cpos = mul(position, EyeViewProjection);
cpos.xyz = cpos.xyz / cpos.w;
cpos.x = cpos.x * 0.5f + 0.5f;
cpos.y = 1 - (cpos.y * 0.5f + 0.5f);
float camera_depth = pow(DepthMap.Sample(Sampler, cpos.xy).r, 100); // Power 100 just to visualize the map since scales are really tiny
return float4(camera_depth, camera_depth, camera_depth, 1);
This yields a correct looking result as well (though I'm not 100% sure about the Z value). Also note that I've made the results exponential to better visualize the depth information (this is not done when attempting live comparisons):
So theoretically, I can use the same code to convert that pixel world position to light space by multiplying by the light's view-projection matrix. Correct? Here's what I tried:
float4 lpos = mul(position, ShadowLightViewProjection[0]);
lpos.xyz = lpos.xyz / lpos.w;
lpos.x = lpos.x * 0.5f + 0.5f;
lpos.y = 1 - (lpos.y * 0.5f + 0.5f);
float shadow_map_depth = pow(ShadowLightMap[0].Sample(Sampler, lpos.xy).r, 100); // Power 100 just to visualize the map since scales are really tiny
return float4(shadow_map_depth, shadow_map_depth, shadow_map_depth, 1);
And here's the result:
And another to show better how it's mapping to the world:
I don't understand what is going on here. It seems it might have something to do with the projection matrix, but I'm not that good with math to know for sure what is happening. It's definitely not the width/height of the light map as I've tried multiple map sizes and the projection matrix is calculated using FOV and aspect ratios never inputing width/height ever.
Finally, here's some C++ code showing how my perspective matrix (used for both eye and light) is calculated:
const auto ys = std::tan((T)1.57079632679f - (fov / (T)2.0));
const auto xs = ys / aspect;
const auto& zf = view_far;
const auto& zn = view_near;
const auto zfn = zf - zn;
row1(xs, 0, 0, 0);
row2(0, ys, 0, 0);
row3(0, 0, zf / zfn, 1);
row4(0, 0, -zn * zf / zfn, 0);
return *this;
I'm completely at a loss here. Any guidance or recommendations would be greatly appreciated!
EDIT - I also forgot to mention that the tiled image is upside down as if the y flip broke it. That's strange to me as it's required to get it back to eye texture space correctly.
I did some tweaking and fixed things here and there. Ultimately, my biggest issue was an unexpectedly transposed matrix. It's a bit complicated as to how the matrix got transposed, but that's why things were flipped. I also changed to D32 depth buffers (though I'm not sure that helped any) and made sure that any positions divided by their W affected all component (including W).
So code like this: hposition.xyz = hposition.xyz / hposition.w
became this: hposition = hposition / hposition.w
After all this tweaking, it's starting to look more like a shadow map.
Oh and the transposed matrix was the ViewProjection of the light.

Calculate clipspace.w from clipspace.xyz and (inv) projection matrix

I'm using a logarithmic depth algorithmic which results in someFunc(clipspace.z) being written to the depth buffer and no implicit perspective divide.
I'm doing RTT / postprocessing so later on in a fragment shader I want to recompute eyespace.xyz, given ndc.xy (from the fragment coordinates) and clipspace.z (from someFuncInv() on the value stored in the depth buffer).
Note that I do not have clipspace.w, and my stored value is not clipspace.z / clipspace.w (as it would be when using fixed function depth) - so something along the lines of ...
float clip_z = ...; /* [-1 .. +1] */
vec2 ndc = vec2(FragCoord.xy / viewport * 2.0 - 1.0);
vec4 clipspace = InvProjMatrix * vec4(ndc, clip_z, 1.0));
clipspace /= clipspace.w;
... does not work here.
So is there a way to calculate clipspace.w out of clipspace.xyz, given the projection matrix or it's inverse?
clipspace.xy = FragCoord.xy / viewport * 2.0 - 1.0;
This is wrong in terms of nomenclature. "Clip space" is the space that the vertex shader (or whatever the last Vertex Processing stage is) outputs. Between clip space and window space is normalized device coordinate (NDC) space. NDC space is clip space divided by the clip space W coordinate:
vec3 ndcspace = clipspace.xyz / clipspace.w;
So the first step is to take our window space coordinates and get NDC space coordinates. Which is easy:
vec3 ndcspace = vec3(FragCoord.xy / viewport * 2.0 - 1.0, depth);
Now, I'm going to assume that your depth value is the proper NDC-space depth. I'm assuming that you fetch the value from a depth texture, then used the depth range near/far values it was rendered with to map it into a [-1, 1] range. If you didn't, you should.
So, now that we have ndcspace, how do we compute clipspace? Well, that's obvious:
vec4 clipspace = vec4(ndcspace * clipspace.w, clipspace.w);
Obvious and... not helpful, since we don't have clipspace.w. So how do we get it?
To get this, we need to look at how clipspace was computed the first time:
vec4 clipspace = Proj * cameraspace;
This means that clipspace.w is computed by taking cameraspace and dot-producting it by the fourth row of Proj.
Well, that's not very helpful. It gets more helpful if we actually look at the fourth row of Proj. Granted, you could be using any projection matrix, and if you're not using the typical projection matrix, this computation becomes more difficult (potentially impossible).
The fourth row of Proj, using the typical projection matrix, is really just this:
[0, 0, -1, 0]
This means that the clipspace.w is really just -cameraspace.z. How does that help us?
It helps by remembering this:
ndcspace.z = clipspace.z / clipspace.w;
ndcspace.z = clipspace.z / -cameraspace.z;
Well, that's nice, but it just trades one unknown for another; we still have an equation with two unknowns (clipspace.z and cameraspace.z). However, we do know something else: clipspace.z comes from dot-producting cameraspace with the third row of our projection matrix. The traditional projection matrix's third row looks like this:
[0, 0, T1, T2]
Where T1 and T2 are non-zero numbers. We'll ignore what these numbers are for the time being. Therefore, clipspace.z is really just T1 * cameraspace.z + T2 * cameraspace.w. And if we know cameraspace.w is 1.0 (as it usually is), then we can remove it:
ndcspace.z = (T1 * cameraspace.z + T2) / -cameraspace.z;
So, we still have a problem. Actually, we don't. Why? Because there is only one unknown in this euqation. Remember: we already know ndcspace.z. We can therefore use ndcspace.z to compute cameraspace.z:
ndcspace.z = -T1 + (-T2 / cameraspace.z);
ndcspace.z + T1 = -T2 / cameraspace.z;
cameraspace.z = -T2 / (ndcspace.z + T1);
T1 and T2 come right out of our projection matrix (the one the scene was originally rendered with). And we already have ndcspace.z. So we can compute cameraspace.z. And we know that:
clispace.w = -cameraspace.z;
Therefore, we can do this:
vec4 clipspace = vec4(ndcspace * clipspace.w, clipspace.w);
Obviously you'll need a float for clipspace.w rather than the literal code, but you get my point. Once you have clipspace, to get camera space, you multiply by the inverse projection matrix:
vec4 cameraspace = InvProj * clipspace;

OpenGL Clip Space Frustum Culling Wrong Results

I implemented a Clip-Space Frustum Culling on the CPU.
In a simple reduced case, I just create a rectangle based on 4 different points which I'm going to render in GL_LINES modes.
But sometimes, it seems to me, I get wrong results. Here is an example:
In this render pass, my frustum culling computation detects, all points would be outside the positive y coordinate in NDC coordinates.
Here is the input:
Points:
P1: -5000, 3, -5000
P2: -5000, 3, 5000
P3: 5000, 3, 5000
P4: 5000, 3, -5000
MVP (rounded):
1.0550 0.0000 -1.4521 1138.9092
-1.1700 1.9331 -0.8500 -6573.4885
-0.6481 -0.5993 -0.4708 -2129.3858
-0.6478 -0.5990 -0.4707 -2108.5455
And the calculations (MVP * Position)
P1 P2 P3 P4
3124 -11397 -847 13674
3532 -4968 -16668 -8168
3463 -1245 -7726 -3018
3482 -1225 -7703 -2996
And finally, transformed by the perspective divide (w-component)
P1 P2 P3 P4
0.897 9.304 0.110 -4.564
1.014 4.056 2.164 2.726
0.995 1.016 1.003 1.007
1.0 1.0 1.0 1.0
As you can see, all transformed points have their y component greater than 1 and should be outside the Viewing Frustum.
I already double checked my matrices. I also made a transform feedback in the Vertex Shader to be sure, I use the same matrices for computation on CPU and GPU. Even the result of MVP * Point is the same in my Vertex Shader as on the GPU. My rendering pipeline is as simple as possible.
The vertex Shader is
vColor = aColor;
gl_Position = MVP * aVertexPosition;
//With Transform Feedback enabled
//transOut = MVP * aVertexPosition
And the fragment Shader
FragColor = vColor;
So the Vertex Shader has the same results as my CPU computations.
But still, they are lines drawn on the screen!
Any Ideas why there are lines?
Do I do something wrong with the perspective divide?
What do I have to do, to detect, that this rectangle should not be culled, because there is at least one line visible (basically a strip of another line is visible as well in this example)
If it helps: The visible red line is the one between P1 and P2...
[Edit] Now I implemented a world space culling by computing the camera frustum normals and hesse normal form equations. This works fine with correct recognition. Sadly I do need correct computations in clip space, since I'm going to make other computations with that points. Someone any ideas?
Here is my computation code:
int outOfBoundArray[6] = {0, 0, 0, 0, 0, 0};
std::vector<glm::dvec4> tileBounds = activeElem->getTileBounds(); //Same as I use for world space culling
const glm::dmat4& viewProj = cam->getCameraTransformations().viewProjectionMatrix; //Same camera which I use for world space culling.
for (int i=0; i<tileBounds.size(); i++) {
//Apply ModelViewProjection Matrix, to Clip Space
glm::dvec4 transformVec = viewProj * tileBounds[i];
//To NDC space [-1,1]
transformVec = transformVec / transformVec[3];
//Culling test
if ( transformVec.x > 1.0 ) outOfBoundArray[0]++;
if ( transformVec.x < -1.0 ) outOfBoundArray[1]++;
if ( transformVec.y > 1.0 ) outOfBoundArray[2]++;
if ( transformVec.y < -1.0 ) outOfBoundArray[3]++;
if ( transformVec.z > 1.0 ) outOfBoundArray[4]++;
if ( transformVec.z < -1.0 ) outOfBoundArray[5]++;
//Other computations...
}
for (int i=0; i<6; i++) {
if (outOfBoundArray[i] == tileBounds.size()) {
return false;
}
}
return true;
The problem would appear that the sign of w (the third component) is different between P1 and P2. This will cause all kinds of trouble due to the nature of projective geometry.
Even though the points are on the same side of the NDC, the line that gets drawn actually goes through infinity: consider what happens when you linearly interpolate between P1 and P2 and do the division by w at each point separately; what happens is that when w approaches zero, the y value is not exactly zero and therefore the line zooms off to infinity. And then wraps around from the other side.
Projective geometry is a weird thing :)
But, for a solution, make sure that you clip those lines that cross the w=0 plane at the positive side of that plane, and you are set - your code should then work.