distortion correction with gpu shader bug - opengl

So I have a camera with a wide angle lens. I know the distortion coefficients, the focal length, the optical center. I want to undistort the image I get from this camera. I used OpenCV for the first try (cv::undistort), which worked well, but was way too slow.
Now I want to do this on the gpu. There is a shader doing exactly this documented in http://willsteptoe.com/post/67401705548/ar-rift-aligning-tracking-and-video-spaces-part-5
the formulas can be seen here:
http://en.wikipedia.org/wiki/Distortion_%28optics%29#Software_correction
So I went and implemented my own version as a glsl shader. I am sending a quad with texture coordinates on the corners between 0..1.
I assume the texture coordinates that arrive are the coordinates of the undistorted image. I calculate the coordinates for the distorted point corresponding to my texture coordinates. Then I sample the distorted image texture.
With this shader nothing in the final image changes. The problem I identified through a cpu implementation is, the coefficient term is very close to zero. The numbers get smaller and smaller through radius squaring etc.. So I have a scaling problem - I can't figure it out what to do differently! I tried everything... I guess it is something quite obvious, since this kind of process seems to work for a lot of people.
I left out the tangential distortion correction for simplicity.
#version 330 core
in vec2 UV;
out vec4 color;
uniform sampler2D textureSampler;
void main()
{
vec2 focalLength = vec2(438.568f, 437.699f);
vec2 opticalCenter = vec2(667.724f, 500.059f);
vec4 distortionCoefficients = vec4(-0.035109f, -0.002393f, 0.000335f, -0.000449f);
const vec2 imageSize = vec2(1280.f, 960.f);
vec2 opticalCenterUV = opticalCenter / imageSize;
vec2 shiftedUVCoordinates = (UV - opticalCenterUV);
vec2 lensCoordinates = shiftedUVCoordinates / focalLength;
float radiusSquared = sqrt(dot(lensCoordinates, lensCoordinates));
float radiusQuadrupled = radiusSquared * radiusSquared;
float coefficientTerm = distortionCoefficients.x * radiusSquared + distortionCoefficients.y * radiusQuadrupled;
vec2 distortedUV = ((lensCoordinates + lensCoordinates * (coefficientTerm))) * focalLength;
vec2 resultUV = (distortedUV + opticalCenterUV);
color = texture2D(textureSampler, resultUV);
}

I see two issues with your solution. The main issue is that you mix two different spaces. You seem to work in [0,1] texture space by converting the optical center to that space, but you did not adjust focalLenght. The key point is that for such a distortion model, the focal lenght is determined in pixels. However, now a pixel is not 1 base unit wide anymore, but 1/width and 1/height units, respectively.
You could add vec2 focalLengthUV = focalLength / imageSize, but you will see that both divisions will cancel out each other when you calculate lensCoordinates. It is much more convenient to convert the texture space UV coordinates to pixel coordinates and use that space directly:
vec2 lensCoordinates = (UV * imageSize - opticalCenter) / focalLenght;
(and also respectively changing the calculation for distortedUV and resultUV).
There is still one issue with the approach I have sketched so far: the conventions of that pixel space I mentioned earlier. In GL, the origin will be the lower left corner, while in most pixel spaces, the origin is at the top left. You might have to flip the y coordinate when doing the conversion. Another thing is where exactly pixel centers are located. So far, the code assumes that pixel centers are at integer + 0.5. The texture coordinate (0,0) is not the center of the lower left pixel, but the corner point. The parameters you use for the distortion might (I don't know OpenCV's conventions) assume the pixel centers at integers, so that instead of the conversion pixelSpace = uv * imageSize, you might need to offset this by half a pixel like pixelSpace = uv * imageSize - vec2(0.5).
The second issue I see is
float radiusSquared = sqrt(dot(lensCoordinates, lensCoordinates));
That sqrt is not correct here, as dot(a,a) will already give the squared lenght of vector a.

Related

texture a ball on a sphere has a dark band

I am using this code to generate sphere vertices and textures but as you can see in the image , when I rotate it I can see a dark band.
for (int i = 0; i <= stacks; ++i)
{
float s = (float)i / (float) stacks;
float theta = s * 2 * glm::pi<float>();
for (int j = 0; j <= slices; ++j)
{
float sl = (float)j / (float) slices;
float phi = sl * (glm::pi<float>());
const float x = cos(theta) * sin(phi);
const float y = sin(theta) * sin(phi);
const float z = cos(phi);
sphere_vertices.push_back(radius * glm::vec3(x, y, z));
sphere_texcoords.push_back((glm::vec2((x + 1.0) / 2.0, (y + 1.0) / 2.0)));
}
}
// get the indices
for (int i = 0; i < stacks * slices + slices; ++i)
{
sphere_indices.push_back(i);
sphere_indices.push_back(i + slices + 1);
sphere_indices.push_back(i + slices);
sphere_indices.push_back(i + slices + 1);
sphere_indices.push_back(i);
sphere_indices.push_back(i + 1);
}
I can't figure a way to make it right whatever texture coordinates I used.
Hmm.. If I use another image, then the mapping is different (and worst!)
vertex shader:
#version 330 core
layout (location = 0) in vec3 aPos;
layout (location = 1) in vec3 aTexCoord;
out vec4 vertexColor;
out vec2 TexCoord;
uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;
void main()
{
gl_Position = projection * view * model * vec4(aPos.x, aPos.y, aPos.z, 1.0);
vertexColor = vec4(0.5, 0.2, 0.5, 1.0);
TexCoord = vec2(aTexCoord.x, aTexCoord.y);
}
fragment shader:
#version 330 core
out vec4 FragColor;
in vec4 vertexColor;
in vec2 TexCoord;
uniform sampler2D sphere_texture;
void main()
{
FragColor = texture(sphere_texture, TexCoord);
}
I am not using any lighting conditions.
If I use FragColor = vec4(TexCoord.x, TexCoord.y, 0.0f, 1.0f); in fragment shader (for debugging purposes) , I am receiving a nice sphere.
I am using this as texture:
That image of the tennis ball that you linked reveals the problem. I'm glad you ultimately provided it.
Your image is a four-channel PNG with transparency (Alpha channel). There are transparent pixels all around the outside of the yellow part of the ball that have (R,G,B,A) = (0, 0, 0, 0), so if you're ignoring the A channel then (R, G, B), will be (0, 0, 0) = black.
Here are just the Red, Green, and Blue (RGB) channels:
And here is just the Alpha (A) channel.
The important thing to notice is that the circle of the ball does not fill the square. There is a significant margin of 53 pixels of black from the extent of the ball to the edge of the texture. We can calculate the radius of the ball from this. Half the width is 1000 pixels, of which 53 pixels are not used. The ball's radius is 1000-53, which is 947 pixels. Or about 94.7% of the distance from the center to the edge of the texture. The remaining 5.3% of the distance is black.
Side note: I also notice that your ball doesn't quite reach 100% opacity. The yellow part of the ball has an alpha channel value of 254 (of 255) Meaning 99.6% opaque. The white lines and the shiny hot spot do actually reach 100% opacity, giving it sort of a Death Star look. ;)
To fix your problem, there's the intuitive approach (which may not work) and then there are two things that you need to do that will work. Here are a few things you can do:
Intuitive Solution:
This won't quite get you 100% there.
1) Resize the ball to fill the texture. Use image editing software to enlarge the ball to fill the texture, or to trim off the black pixels. This will just make more efficient use of pixels, for one, but it will ensure that there are useful pixels being sampled at the boundary. You'll probably want to expand the image to be slightly larger than 100%. I'll explain why below.
2) Remap your texture coordinates to only extend to 94.7% of the radius of the ball. (Similar to approach 1, but doesn't require image editing). This just uses coordinates that actually correspond to the image you provided. Your x and y coordinates need to be scaled about the center of the image and reduced to about 94.7%.
x2 = 0.5 + (x - 0.5) * 0.947;
y2 = 0.5 + (y - 0.5) * 0.947;
Suggested Solution:
This will ensure no more black.
3) Fill the "black" portion of your ball texture with a less objectionable colour - probably the colour that is at the circumference of the tennis ball. This ensures that any texels that are sampled at exactly the edge of the ball won't be linearly combined with black to produce an unsightly dark-but-not-quite-black band, which is almost the problem you have right now anyway. You can do this in two ways. A) Image editing software. Remove the transparency from your image and matte it against a dark yellow colour. B) Use the shader to detect pixels that are outside the image and replace them with a border colour (this is clever, but probably more trouble than it's worth.)
Different Texture Coordinates
The last thing you can do is avoid this degenerate texture mapping coordinate problem altogether. At the equator, you're not really sure which pixels to sample. The black (transparent) pixels or the coloured pixels of the ball. The discrete nature of square pixels, is fighting against the polar nature of your texture map. You'll never find the exact colour you need near the edge to produce a continuous, seamless map. Instead, you can use a different coordinate system. I hope you're not attached to how that ball looks, because let me introduce you to the equirectangular projection. It's the same projection that you can naively use to map the globe of the Earth to a typical rectangular map of the world you're likely familiar with where the north and south poles get all the distortion but the equatorial regions look pretty good.
Here's your image mapped to equirectangular coordinates:
Notice that black bar at the bottom...we're onto something! That black bar is actually exactly what appears around the equator of your ball with your current texture mapping coordinate system. But with this coordinate system, you can see easily that if we just remapped the ball to fill the square we'd completely eliminate any transparent pixels at all.
It may be inconvenient to work in this coordinate system, but you can transform your image in Photoshop using Filter > Distort > Polar Coordinates... > Polar to Rectangular.
Sigismondo's answer already suggests how to adjust your texture mapping coordinates do this.
And finally, here's a texture that is both enlarged to fill the texture space, and remapped to equirectangular coordinates. No black bars, minimal distortion. But you'll have to use Sigismondo's texture mapping coordinates. Again, this may not be for you, especially if you're attached to the idea of the direct projection for your texture (i.e.: if you don't want to manipulate your tennis ball image and you want to use that projection.) But if you're willing to remap your data, you can rest easy that all the black pixels will be gone!
Good luck! Feel free to ask for clarifications.
I cannot test it, being the code incomplete, but from a rough look I have spotted this problem:
sphere_texcoords.push_back((glm::vec2((x + 1.0) / 2.0, (y + 1.0) / 2.0)));
The texture coordinates should not be evaluated from x and y, being:
const float x = cos(theta) * sin(phi);
const float y = sin(theta) * sin(phi);
but from the angles thta-phi, or stacks-slices. this could work better - untested:
sphere_texcoords.push_back(glm::vec2(s,sl));
being already defined:
float s = (float)i / (float) stacks;
float sl = (float)j / (float) slices;
Furthermore in your code you are using the first and the last "slices" of the sphere as the rest... Shouldn't they be treated differently? This seems quite odd to me - but I don't know whether your implementation is just a simpler one, working fine.
Compare with this explanation, for example: http://www.songho.ca/opengl/gl_sphere.html

OpenGL vertex shader for pinhole camera model

I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model (as defined for example here). Currently I use the vertex shader to map the 3D vertices to the clip space, where K in the shader contains [focal length x, focal length y, principal point x, principal point y] and zrange is the depth range of the vertices.
#version 330 core
layout (location = 0) in vec3 vin;
layout (location = 1) in vec3 cin;
layout (location = 2) in vec3 nin;
out vec3 shader_pos;
out vec3 shader_color;
out vec3 shader_normal;
uniform vec4 K;
uniform vec2 zrange;
uniform vec2 imsize;
void main() {
vec3 uvd;
uvd.x = (K[0] * vin.x + K[2] * vin.z) / vin.z;
uvd.y = (K[1] * vin.y + K[3] * vin.z) / vin.z;
uvd.x = 2 * uvd.x / (imsize[0]) - 1;
uvd.y = 2 * uvd.y / (imsize[1]) - 1;
uvd.z = 2 * (vin.z - zrange[0]) / (zrange[1] - zrange[0]) - 1;
shader_pos = uvd;
shader_color = cin;
shader_normal = nin;
gl_Position = vec4(uvd.xyz, 1.0);
}
I verify the renderings with a simple ray-tracer, however there seems to be an offset stemming from my OpenGL implementation. The depth values are different, but not by an affine offset as it would be caused by a wrong remapping (see the slanted surface on the tetrahedron, ignoring the errors on the edges).
I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model.
A standard perspective projection matrix already implements a pinhole camera model. What you're doing here is just having more calculations per vertex, which could all be pre-calculated on the CPU and put in a single matrix.
The only difference is the z range. But a "pinhole camera" does not have a z range, all points are projected to the image plane. So what you want here is a pinhole camera model for x and y, and a linear mapping for z.
However, your implementation is wrong. A GPU will interpolate the z linearly in window space. That means, it will calculate the barycentric coordinates of each fragment with respect to the 2D projection of the triangle of the window. However, when using a perspective projection, and when the triangle is not excatly parallel to the image plane, those barycentric coordinates will not be those the respective 3D point would have had with respect to the actual 3D primitive before the projection.
The trick here is that since in screen space, we typically have x/z and y/z as the vertex coordinates, and when we interpolate linaerily inbetween that, we also have to interpolate 1/z for the depth. However, in reality, we don't divide by z, but w (and let the projection matrix set w_clip = [+/-]z_eye for us). After the division by w_clip, we get a hyperbolic mapping of the z value, but with the nice property that it can be linearly interpolated in window space.
What this means is that by your use of a linear z mapping, your primitives now would have to be bend along the z dimension to get the correct result. Look at the following top-down view of the situation. The "lines" represent flat triangles, looked from straight above:
In eye space, the view rays would all go from the origin through each pixel (we could imagine the 2D pixel raster on the near plane, for example). In NDC, we have transformed this to an orthograhic projection. The pixels still can be imagined at the near plane, but all view rays now are parallel.
In the standard hyperbolical mapping, the point in the middle of the frustum is compressed much towards the end. However, the traingle still is flat.
If you use a linear mapping instead, your triangle would have not to be flat any more. Look for example at the intersection point between the two traingles. It must have the same x (and y) coordinate as in the hyperbolic case, for the correct result.
However, you only transform the vertices according to a linear z value, the GPU will still linearly interpolate the result, so in your case, you would get straight connections between your transformed points, your intersection point between the two triangles is moved, and your depth values are all wrong except for the actual vertex points itself.
If you want to use a linear depth buffer, you have to correct the depth of each fragment in the fragment shader, to implement the required non-linear interpolation on your own. Doing so would break a lot of the clever depth test optimizations GPUs do, notably early Z and hierachical Z, so while it is possible, you'l loose some performance.
The much better solution is: Just use a standard hyperbolic depth value. Just linearize the depth values after you read them back. Also, don't do the z Division in the vertex shader. You do not only break z this way, you also break the perspective-corrected interpolation of the varyings, so your shading will also be wrong. Let the GPU do the division, just shuffle the correct value into gl_Position.w. The GPU will internally not only do the divide, the perspective corrected interpolation also depends on w.

Why differs gl_FragCoord.z from ((pos.z / pos.w) + 1.0) * 0.5?

Does anyone know why 'depth' (vertShader) differs from 'gl_FragCoord.z' (rendered from opengl)? Especially with decreasing z the difference becomes higher. Is it possible that 'depth' is at higher z values more precise?
.vsh
out float depth;
void main (void) {
vec4 pos = mvpMatrix * vertex;
depth = ((pos.z / pos.w) + 1.0) * 0.5;
gl_Position = pos;
}
.fsh
in float depth;
void main(void) {
gl_FragDepth = depth;// or gl_FragCoord.z;
}
There are a couple of issues with your approach, with the main points are:
gl_FragCoord.z is hyperbolically distorted window space z value. However, the hyperoblical z/w value for each vertex is just linearily interpolated in screen space for each framgent. But when you use a varying out float depth = (pos.z / pos.w), the GL will do a perspective-corrected interpolation which is non-linear. You could fix this by using flat out float depth.
(pos.z/pos.w) doesn't even make sense. Think about it: if the point lies in a plane where the camera is, you'll get pos.w=0, and no valid result. gl_FragCoord.z does not have this issue because the clipping is done before the divide, and it will do the divide for a new vertex which lies on the near plane, and which you'll never going to see (there's no vertex shader invocation for that).
The issue is also present when points lie behind the camera, they will end up mirrored in front of the camera. If you have a primitive where vertices lie on both sides of the camera, you will get complete bullshit as your interpolated depth value, no matter which interpolation method you chose.

Pixel perfect text rendering in perspective projection

I'm trying to render text as textured quads in perspective projection (do not want to use ortho projection), and I'm struggling with pixel alignment.
The setup is simple, I have text with align anchor point in 3D, I change its model transformation into billboard transformation, and calculate scale (triangle similarity) to have the text always the same size. Since geometry of text quads is constructed with world units corresponding to pixels, resulting text indeed seems to be the same size no matter camera orientation or anchor point offset.
Vector3d dist = camera.getPosition();
dist.sub(translation);
double pxFov = camera.getFOV() / camera.getScreenWidth();
double scale = Math.sin(pxFov) / Math.sin((Math.PI / 2) - pxFov)
* dist.length() * camera.getAspectRatio();
V.getRotationScale(R);
R.transpose();
M.setIdentity();
M.setRotation(R);
M.setScale(scale);
M.setTranslation(translation);
Where V is 4x4 camera view matrix, R temporary 3x3 matrix, and M is 4x4 model transformation matrix used for MVP calculation.
I found supposed solution, but it just slightly changed behaviour of rendered text, instead of fixing the problem.
When using vertex shader
void main () {
vec2 view = vec2(1280, 720);
vec4 cpos = MVP * vec4 (position, 1.0f);
vec2 p = floor(cpos.xy * view*0.5/cpos.w);
p += 0.5; // does not influence result
cpos.xy = p * (2.0/view*cpos.w);
gl_Position = cpos;
}
text does render in some places sharp, and in some places blurred
In case of simple vertex shader
void main () {
gl_Position = MVP * vec4 (position, 1.0f);
}
is the text either completely blurred or completely sharp
It seems logial getting vertex positions to viewport space, rounding it there and moving it back, but something seems to be missing.
EDIT: Explanation of how I'm calculating the scale factor.
Here you can see I'm getting right angle triangle as red, green and blue lines. Knowing lenght of red (camera-text anchor distance), angle between red and blue ((fov/2)/(screen width/2)), and angle between red and green being right angle, I can use law of sines to calculate length of the green line, which is also scale of one texel to have same size in current projection.
Scale of the text seems to be correct no matter camera/text orientation/position (desired 8 pixels). It is possible the scale is just lightly wrong and that results in the blur effect, but I fail to see how.
It seems to me that due to numerical issues you're suffering z-buffer fighting.
Perhaps you can add a little value to the text z-coordinate, separate it just a bit from its background.
Another solution is to avoid the text scale-unscale calculation.
You may calculate pixel anchorage by doing perspective maths on CPU side for every text. Then display it with orthographic projection.

reconstructed world position from depth is wrong

I'm trying to implement deferred shading/lighting. In order to reduce the number/size of the buffers I use I wanted to use the depth texture to reconstruct world position later on.
I do this by multiplying the pixel's coordinates with the inverse of the projection matrix and the inverse of the camera matrix. This sort of works, but the position is a bit off. Here's the absolute difference with a sampled world position texture:
For reference, this is the code I use in the second pass fragment shader:
vec2 screenPosition_texture = vec2((gl_FragCoord.x)/WIDTH, (gl_FragCoord.y)/HEIGHT);
float pixelDepth = texture2D(depth, screenPosition_texture).x;
vec4 worldPosition = pMatInverse*vec4(VertexIn.position, pixelDepth, 1.0);
worldPosition = vec4(worldPosition.xyz/worldPosition.w, 1.0);
//worldPosition /= 1.85;
worldPosition = cMatInverse*worldPosition_byDepth;
If I uncomment worldPosition /= 1.85, the position is reconstructed a lot better (on my geometry/range of depth values). I just got this value by messing around after comparing my output with what it should be (stored in a third texture).
I'm using 0.1 near, 100.0 far and my geometries are up to about 15 away.
I know there may be precision errors, but this seems a bit too big of an error too close to the camera.
Did I miss anything here?
As mentioned in a comment:
I didn't convert the depth value from NDC space to clip space.
I should have added this line:
pixelDepth = pixelDepth * 2.0 - 1.0;