Pixel perfect text rendering in perspective projection - opengl

I'm trying to render text as textured quads in perspective projection (do not want to use ortho projection), and I'm struggling with pixel alignment.
The setup is simple, I have text with align anchor point in 3D, I change its model transformation into billboard transformation, and calculate scale (triangle similarity) to have the text always the same size. Since geometry of text quads is constructed with world units corresponding to pixels, resulting text indeed seems to be the same size no matter camera orientation or anchor point offset.
Vector3d dist = camera.getPosition();
double pxFov = camera.getFOV() / camera.getScreenWidth();
double scale = Math.sin(pxFov) / Math.sin((Math.PI / 2) - pxFov)
* dist.length() * camera.getAspectRatio();
Where V is 4x4 camera view matrix, R temporary 3x3 matrix, and M is 4x4 model transformation matrix used for MVP calculation.
I found supposed solution, but it just slightly changed behaviour of rendered text, instead of fixing the problem.
When using vertex shader
void main () {
vec2 view = vec2(1280, 720);
vec4 cpos = MVP * vec4 (position, 1.0f);
vec2 p = floor(cpos.xy * view*0.5/cpos.w);
p += 0.5; // does not influence result
cpos.xy = p * (2.0/view*cpos.w);
gl_Position = cpos;
text does render in some places sharp, and in some places blurred
In case of simple vertex shader
void main () {
gl_Position = MVP * vec4 (position, 1.0f);
is the text either completely blurred or completely sharp
It seems logial getting vertex positions to viewport space, rounding it there and moving it back, but something seems to be missing.
EDIT: Explanation of how I'm calculating the scale factor.
Here you can see I'm getting right angle triangle as red, green and blue lines. Knowing lenght of red (camera-text anchor distance), angle between red and blue ((fov/2)/(screen width/2)), and angle between red and green being right angle, I can use law of sines to calculate length of the green line, which is also scale of one texel to have same size in current projection.
Scale of the text seems to be correct no matter camera/text orientation/position (desired 8 pixels). It is possible the scale is just lightly wrong and that results in the blur effect, but I fail to see how.

It seems to me that due to numerical issues you're suffering z-buffer fighting.
Perhaps you can add a little value to the text z-coordinate, separate it just a bit from its background.
Another solution is to avoid the text scale-unscale calculation.
You may calculate pixel anchorage by doing perspective maths on CPU side for every text. Then display it with orthographic projection.


OpenGL vertex shader for pinhole camera model

I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model (as defined for example here). Currently I use the vertex shader to map the 3D vertices to the clip space, where K in the shader contains [focal length x, focal length y, principal point x, principal point y] and zrange is the depth range of the vertices.
#version 330 core
layout (location = 0) in vec3 vin;
layout (location = 1) in vec3 cin;
layout (location = 2) in vec3 nin;
out vec3 shader_pos;
out vec3 shader_color;
out vec3 shader_normal;
uniform vec4 K;
uniform vec2 zrange;
uniform vec2 imsize;
void main() {
vec3 uvd;
uvd.x = (K[0] * vin.x + K[2] * vin.z) / vin.z;
uvd.y = (K[1] * vin.y + K[3] * vin.z) / vin.z;
uvd.x = 2 * uvd.x / (imsize[0]) - 1;
uvd.y = 2 * uvd.y / (imsize[1]) - 1;
uvd.z = 2 * (vin.z - zrange[0]) / (zrange[1] - zrange[0]) - 1;
shader_pos = uvd;
shader_color = cin;
shader_normal = nin;
gl_Position = vec4(uvd.xyz, 1.0);
I verify the renderings with a simple ray-tracer, however there seems to be an offset stemming from my OpenGL implementation. The depth values are different, but not by an affine offset as it would be caused by a wrong remapping (see the slanted surface on the tetrahedron, ignoring the errors on the edges).
I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model.
A standard perspective projection matrix already implements a pinhole camera model. What you're doing here is just having more calculations per vertex, which could all be pre-calculated on the CPU and put in a single matrix.
The only difference is the z range. But a "pinhole camera" does not have a z range, all points are projected to the image plane. So what you want here is a pinhole camera model for x and y, and a linear mapping for z.
However, your implementation is wrong. A GPU will interpolate the z linearly in window space. That means, it will calculate the barycentric coordinates of each fragment with respect to the 2D projection of the triangle of the window. However, when using a perspective projection, and when the triangle is not excatly parallel to the image plane, those barycentric coordinates will not be those the respective 3D point would have had with respect to the actual 3D primitive before the projection.
The trick here is that since in screen space, we typically have x/z and y/z as the vertex coordinates, and when we interpolate linaerily inbetween that, we also have to interpolate 1/z for the depth. However, in reality, we don't divide by z, but w (and let the projection matrix set w_clip = [+/-]z_eye for us). After the division by w_clip, we get a hyperbolic mapping of the z value, but with the nice property that it can be linearly interpolated in window space.
What this means is that by your use of a linear z mapping, your primitives now would have to be bend along the z dimension to get the correct result. Look at the following top-down view of the situation. The "lines" represent flat triangles, looked from straight above:
In eye space, the view rays would all go from the origin through each pixel (we could imagine the 2D pixel raster on the near plane, for example). In NDC, we have transformed this to an orthograhic projection. The pixels still can be imagined at the near plane, but all view rays now are parallel.
In the standard hyperbolical mapping, the point in the middle of the frustum is compressed much towards the end. However, the traingle still is flat.
If you use a linear mapping instead, your triangle would have not to be flat any more. Look for example at the intersection point between the two traingles. It must have the same x (and y) coordinate as in the hyperbolic case, for the correct result.
However, you only transform the vertices according to a linear z value, the GPU will still linearly interpolate the result, so in your case, you would get straight connections between your transformed points, your intersection point between the two triangles is moved, and your depth values are all wrong except for the actual vertex points itself.
If you want to use a linear depth buffer, you have to correct the depth of each fragment in the fragment shader, to implement the required non-linear interpolation on your own. Doing so would break a lot of the clever depth test optimizations GPUs do, notably early Z and hierachical Z, so while it is possible, you'l loose some performance.
The much better solution is: Just use a standard hyperbolic depth value. Just linearize the depth values after you read them back. Also, don't do the z Division in the vertex shader. You do not only break z this way, you also break the perspective-corrected interpolation of the varyings, so your shading will also be wrong. Let the GPU do the division, just shuffle the correct value into gl_Position.w. The GPU will internally not only do the divide, the perspective corrected interpolation also depends on w.

How does the coordinate system work for 3D textures in OpenGL?

I am attempting to write and read from a 3D texture, but it seems my mapping is wrong. I have used Render doc to check the textures and they look ok.
A random layer of this voluemtric texture looks like:
So just some blue to denote absence and some green values to denote pressence.
The coordinates I calculate when I write to each layer are calculated in the vertex shader as:
pos.x = (2.f*pos.x-width+2)/(width-2);
pos.y = (2.f*pos.y-depth+2)/(depth-2);
pos.z -= level;
pos.z *= 1.f/voxel_size;
gl_Position = pos;
Since the texture itself looks ok it seems these coordinates are good to achieve my goal.
It's important to note that right now voxel_size is 1 and the scale of the texture is supposed to be 1 to 1 with the scene dimensions. In essence, each pixel in the texture represents a 1x1x1 voxel in the scene.
Next I attempt to fetch the texture values as follows:
vec3 pos = vertexPos;
pos.x = (2.f*pos.x-width+2)/(width-2);
pos.y = (2.f*pos.y-depth+2)/(depth-2);
pos.z *= 1.f/(4*16);
outColor = texture(voxel_map, pos);
Where vertexPos is the global vertex position in the scene. The z coordinate may be completely wrong however (i am not sure if I am supposed to normalize the depth component or not) but that is not the only issue. If you look at the final result:
There is a horizontal sclae component problem. Since each texel represents a voxel, the color of a cube should always be a fixed color. But as you can see I am getting multiple colors for a single cube on the top faces. So my horizontal scale is wrong.
What am i doing wrong when fetching the texels from the texture?

OpenGL shadow mapping with deferred rendering, position transformation

I am using deferred rendering where i store the eye space position in a texture accordingly:
gl_Position = vec4(vertex_position, 1.0);
vertexOut.position = vec3(viewMatrix * modelMatrix * gl_in[i].gl_Position);
positionOut = vec3(vertexIn.position);
Now, in the second pass (lighting pass) I am trying to sample my shadow map, using UV coordinates calculated from this vec4
vec4 lightSpacePos = lightProjectionMatrix * lightViewMatrix * lightModelMatrix * vec4(position, 1.0);
The position used is the same position stored and sampled from the position texture.
Do I need to transfrom the position with the inverse camera view matrix before doing this calculation? To bring it back to world space or how should I proceed?
Typically shadow mapping is done by comparing the window-space Z coordinate (this is what a depth texture stores) of your current fragment vs. your light. This must be done using a common reference orientation, so that involves re-projecting your current fragment's position from the perspective of your light.
You have the view-space position right now, which is relative to your current camera and not particularly useful. To do this effectively you want world-space position. You can get that if you transform the view-space position by the inverse view matrix.
Given world-space position, transform into clip-space from light's perspective:
// This will be in clip-space
vec4 lightSpacePos = lightProjectionMatrix * lightViewMatrix * vec4 (worldPos);
// Transform it into NDC-space by dividing by w
lightSpacePos /= lightSpacePos.w;
// Range is now [-1.0, 1.0], but you need [0.0, 1.0]
lightSpacePos = lightSpacePos * vec4 (0.5) + vec4 (0.5);
Assuming default depth range, lightSpacePos is now ready for use. xy contains the texture coordinates to sample from your shadow map and z contains the depth to use for comparison.
For a more thorough explanation, see the following answer.
Incidentally, you will want to eliminate your position texture from your G-Buffer to achieve reasonable performance. It is very easy to reconstruct world- or view-space position given only the depth and the projection and view matrices and the arithmetic involved is much quicker than an extra texture fetch. Storing an additional texture with adequate precision to represent position in 3D space will burn through tons of memory bandwidth each frame and is completely unnecessary.
This article from the OpenGL Wiki explains how to do this. You can take it one step farther and work back to world-space, which is more desirable than view-space. You may need to tweak your depth buffer a little bit to get adequate precision, but it will still be quicker than storing position separately.

distortion correction with gpu shader bug

So I have a camera with a wide angle lens. I know the distortion coefficients, the focal length, the optical center. I want to undistort the image I get from this camera. I used OpenCV for the first try (cv::undistort), which worked well, but was way too slow.
Now I want to do this on the gpu. There is a shader doing exactly this documented in http://willsteptoe.com/post/67401705548/ar-rift-aligning-tracking-and-video-spaces-part-5
the formulas can be seen here:
So I went and implemented my own version as a glsl shader. I am sending a quad with texture coordinates on the corners between 0..1.
I assume the texture coordinates that arrive are the coordinates of the undistorted image. I calculate the coordinates for the distorted point corresponding to my texture coordinates. Then I sample the distorted image texture.
With this shader nothing in the final image changes. The problem I identified through a cpu implementation is, the coefficient term is very close to zero. The numbers get smaller and smaller through radius squaring etc.. So I have a scaling problem - I can't figure it out what to do differently! I tried everything... I guess it is something quite obvious, since this kind of process seems to work for a lot of people.
I left out the tangential distortion correction for simplicity.
#version 330 core
in vec2 UV;
out vec4 color;
uniform sampler2D textureSampler;
void main()
vec2 focalLength = vec2(438.568f, 437.699f);
vec2 opticalCenter = vec2(667.724f, 500.059f);
vec4 distortionCoefficients = vec4(-0.035109f, -0.002393f, 0.000335f, -0.000449f);
const vec2 imageSize = vec2(1280.f, 960.f);
vec2 opticalCenterUV = opticalCenter / imageSize;
vec2 shiftedUVCoordinates = (UV - opticalCenterUV);
vec2 lensCoordinates = shiftedUVCoordinates / focalLength;
float radiusSquared = sqrt(dot(lensCoordinates, lensCoordinates));
float radiusQuadrupled = radiusSquared * radiusSquared;
float coefficientTerm = distortionCoefficients.x * radiusSquared + distortionCoefficients.y * radiusQuadrupled;
vec2 distortedUV = ((lensCoordinates + lensCoordinates * (coefficientTerm))) * focalLength;
vec2 resultUV = (distortedUV + opticalCenterUV);
color = texture2D(textureSampler, resultUV);
I see two issues with your solution. The main issue is that you mix two different spaces. You seem to work in [0,1] texture space by converting the optical center to that space, but you did not adjust focalLenght. The key point is that for such a distortion model, the focal lenght is determined in pixels. However, now a pixel is not 1 base unit wide anymore, but 1/width and 1/height units, respectively.
You could add vec2 focalLengthUV = focalLength / imageSize, but you will see that both divisions will cancel out each other when you calculate lensCoordinates. It is much more convenient to convert the texture space UV coordinates to pixel coordinates and use that space directly:
vec2 lensCoordinates = (UV * imageSize - opticalCenter) / focalLenght;
(and also respectively changing the calculation for distortedUV and resultUV).
There is still one issue with the approach I have sketched so far: the conventions of that pixel space I mentioned earlier. In GL, the origin will be the lower left corner, while in most pixel spaces, the origin is at the top left. You might have to flip the y coordinate when doing the conversion. Another thing is where exactly pixel centers are located. So far, the code assumes that pixel centers are at integer + 0.5. The texture coordinate (0,0) is not the center of the lower left pixel, but the corner point. The parameters you use for the distortion might (I don't know OpenCV's conventions) assume the pixel centers at integers, so that instead of the conversion pixelSpace = uv * imageSize, you might need to offset this by half a pixel like pixelSpace = uv * imageSize - vec2(0.5).
The second issue I see is
float radiusSquared = sqrt(dot(lensCoordinates, lensCoordinates));
That sqrt is not correct here, as dot(a,a) will already give the squared lenght of vector a.

reconstructed world position from depth is wrong

I'm trying to implement deferred shading/lighting. In order to reduce the number/size of the buffers I use I wanted to use the depth texture to reconstruct world position later on.
I do this by multiplying the pixel's coordinates with the inverse of the projection matrix and the inverse of the camera matrix. This sort of works, but the position is a bit off. Here's the absolute difference with a sampled world position texture:
For reference, this is the code I use in the second pass fragment shader:
vec2 screenPosition_texture = vec2((gl_FragCoord.x)/WIDTH, (gl_FragCoord.y)/HEIGHT);
float pixelDepth = texture2D(depth, screenPosition_texture).x;
vec4 worldPosition = pMatInverse*vec4(VertexIn.position, pixelDepth, 1.0);
worldPosition = vec4(worldPosition.xyz/worldPosition.w, 1.0);
//worldPosition /= 1.85;
worldPosition = cMatInverse*worldPosition_byDepth;
If I uncomment worldPosition /= 1.85, the position is reconstructed a lot better (on my geometry/range of depth values). I just got this value by messing around after comparing my output with what it should be (stored in a third texture).
I'm using 0.1 near, 100.0 far and my geometries are up to about 15 away.
I know there may be precision errors, but this seems a bit too big of an error too close to the camera.
Did I miss anything here?
As mentioned in a comment:
I didn't convert the depth value from NDC space to clip space.
I should have added this line:
pixelDepth = pixelDepth * 2.0 - 1.0;