OpenGL vertex shader for pinhole camera model - opengl

I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model (as defined for example here). Currently I use the vertex shader to map the 3D vertices to the clip space, where K in the shader contains [focal length x, focal length y, principal point x, principal point y] and zrange is the depth range of the vertices.
#version 330 core
layout (location = 0) in vec3 vin;
layout (location = 1) in vec3 cin;
layout (location = 2) in vec3 nin;
out vec3 shader_pos;
out vec3 shader_color;
out vec3 shader_normal;
uniform vec4 K;
uniform vec2 zrange;
uniform vec2 imsize;
void main() {
vec3 uvd;
uvd.x = (K[0] * vin.x + K[2] * vin.z) / vin.z;
uvd.y = (K[1] * vin.y + K[3] * vin.z) / vin.z;
uvd.x = 2 * uvd.x / (imsize[0]) - 1;
uvd.y = 2 * uvd.y / (imsize[1]) - 1;
uvd.z = 2 * (vin.z - zrange[0]) / (zrange[1] - zrange[0]) - 1;
shader_pos = uvd;
shader_color = cin;
shader_normal = nin;
gl_Position = vec4(uvd.xyz, 1.0);
}
I verify the renderings with a simple ray-tracer, however there seems to be an offset stemming from my OpenGL implementation. The depth values are different, but not by an affine offset as it would be caused by a wrong remapping (see the slanted surface on the tetrahedron, ignoring the errors on the edges).

I am trying to implement a simple OpenGL renderer that simulates a pinhole camera model.
A standard perspective projection matrix already implements a pinhole camera model. What you're doing here is just having more calculations per vertex, which could all be pre-calculated on the CPU and put in a single matrix.
The only difference is the z range. But a "pinhole camera" does not have a z range, all points are projected to the image plane. So what you want here is a pinhole camera model for x and y, and a linear mapping for z.
However, your implementation is wrong. A GPU will interpolate the z linearly in window space. That means, it will calculate the barycentric coordinates of each fragment with respect to the 2D projection of the triangle of the window. However, when using a perspective projection, and when the triangle is not excatly parallel to the image plane, those barycentric coordinates will not be those the respective 3D point would have had with respect to the actual 3D primitive before the projection.
The trick here is that since in screen space, we typically have x/z and y/z as the vertex coordinates, and when we interpolate linaerily inbetween that, we also have to interpolate 1/z for the depth. However, in reality, we don't divide by z, but w (and let the projection matrix set w_clip = [+/-]z_eye for us). After the division by w_clip, we get a hyperbolic mapping of the z value, but with the nice property that it can be linearly interpolated in window space.
What this means is that by your use of a linear z mapping, your primitives now would have to be bend along the z dimension to get the correct result. Look at the following top-down view of the situation. The "lines" represent flat triangles, looked from straight above:
In eye space, the view rays would all go from the origin through each pixel (we could imagine the 2D pixel raster on the near plane, for example). In NDC, we have transformed this to an orthograhic projection. The pixels still can be imagined at the near plane, but all view rays now are parallel.
In the standard hyperbolical mapping, the point in the middle of the frustum is compressed much towards the end. However, the traingle still is flat.
If you use a linear mapping instead, your triangle would have not to be flat any more. Look for example at the intersection point between the two traingles. It must have the same x (and y) coordinate as in the hyperbolic case, for the correct result.
However, you only transform the vertices according to a linear z value, the GPU will still linearly interpolate the result, so in your case, you would get straight connections between your transformed points, your intersection point between the two triangles is moved, and your depth values are all wrong except for the actual vertex points itself.
If you want to use a linear depth buffer, you have to correct the depth of each fragment in the fragment shader, to implement the required non-linear interpolation on your own. Doing so would break a lot of the clever depth test optimizations GPUs do, notably early Z and hierachical Z, so while it is possible, you'l loose some performance.
The much better solution is: Just use a standard hyperbolic depth value. Just linearize the depth values after you read them back. Also, don't do the z Division in the vertex shader. You do not only break z this way, you also break the perspective-corrected interpolation of the varyings, so your shading will also be wrong. Let the GPU do the division, just shuffle the correct value into gl_Position.w. The GPU will internally not only do the divide, the perspective corrected interpolation also depends on w.

Related

OpenGL shadow mapping with deferred rendering, position transformation

I am using deferred rendering where i store the eye space position in a texture accordingly:
vertex:
gl_Position = vec4(vertex_position, 1.0);
geometry:
vertexOut.position = vec3(viewMatrix * modelMatrix * gl_in[i].gl_Position);
fragment:
positionOut = vec3(vertexIn.position);
Now, in the second pass (lighting pass) I am trying to sample my shadow map, using UV coordinates calculated from this vec4
vec4 lightSpacePos = lightProjectionMatrix * lightViewMatrix * lightModelMatrix * vec4(position, 1.0);
The position used is the same position stored and sampled from the position texture.
Do I need to transfrom the position with the inverse camera view matrix before doing this calculation? To bring it back to world space or how should I proceed?
Typically shadow mapping is done by comparing the window-space Z coordinate (this is what a depth texture stores) of your current fragment vs. your light. This must be done using a common reference orientation, so that involves re-projecting your current fragment's position from the perspective of your light.
You have the view-space position right now, which is relative to your current camera and not particularly useful. To do this effectively you want world-space position. You can get that if you transform the view-space position by the inverse view matrix.
Given world-space position, transform into clip-space from light's perspective:
// This will be in clip-space
vec4 lightSpacePos = lightProjectionMatrix * lightViewMatrix * vec4 (worldPos);
// Transform it into NDC-space by dividing by w
lightSpacePos /= lightSpacePos.w;
// Range is now [-1.0, 1.0], but you need [0.0, 1.0]
lightSpacePos = lightSpacePos * vec4 (0.5) + vec4 (0.5);
Assuming default depth range, lightSpacePos is now ready for use. xy contains the texture coordinates to sample from your shadow map and z contains the depth to use for comparison.
For a more thorough explanation, see the following answer.
Incidentally, you will want to eliminate your position texture from your G-Buffer to achieve reasonable performance. It is very easy to reconstruct world- or view-space position given only the depth and the projection and view matrices and the arithmetic involved is much quicker than an extra texture fetch. Storing an additional texture with adequate precision to represent position in 3D space will burn through tons of memory bandwidth each frame and is completely unnecessary.
This article from the OpenGL Wiki explains how to do this. You can take it one step farther and work back to world-space, which is more desirable than view-space. You may need to tweak your depth buffer a little bit to get adequate precision, but it will still be quicker than storing position separately.

distortion correction with gpu shader bug

So I have a camera with a wide angle lens. I know the distortion coefficients, the focal length, the optical center. I want to undistort the image I get from this camera. I used OpenCV for the first try (cv::undistort), which worked well, but was way too slow.
Now I want to do this on the gpu. There is a shader doing exactly this documented in http://willsteptoe.com/post/67401705548/ar-rift-aligning-tracking-and-video-spaces-part-5
the formulas can be seen here:
http://en.wikipedia.org/wiki/Distortion_%28optics%29#Software_correction
So I went and implemented my own version as a glsl shader. I am sending a quad with texture coordinates on the corners between 0..1.
I assume the texture coordinates that arrive are the coordinates of the undistorted image. I calculate the coordinates for the distorted point corresponding to my texture coordinates. Then I sample the distorted image texture.
With this shader nothing in the final image changes. The problem I identified through a cpu implementation is, the coefficient term is very close to zero. The numbers get smaller and smaller through radius squaring etc.. So I have a scaling problem - I can't figure it out what to do differently! I tried everything... I guess it is something quite obvious, since this kind of process seems to work for a lot of people.
I left out the tangential distortion correction for simplicity.
#version 330 core
in vec2 UV;
out vec4 color;
uniform sampler2D textureSampler;
void main()
{
vec2 focalLength = vec2(438.568f, 437.699f);
vec2 opticalCenter = vec2(667.724f, 500.059f);
vec4 distortionCoefficients = vec4(-0.035109f, -0.002393f, 0.000335f, -0.000449f);
const vec2 imageSize = vec2(1280.f, 960.f);
vec2 opticalCenterUV = opticalCenter / imageSize;
vec2 shiftedUVCoordinates = (UV - opticalCenterUV);
vec2 lensCoordinates = shiftedUVCoordinates / focalLength;
float radiusSquared = sqrt(dot(lensCoordinates, lensCoordinates));
float radiusQuadrupled = radiusSquared * radiusSquared;
float coefficientTerm = distortionCoefficients.x * radiusSquared + distortionCoefficients.y * radiusQuadrupled;
vec2 distortedUV = ((lensCoordinates + lensCoordinates * (coefficientTerm))) * focalLength;
vec2 resultUV = (distortedUV + opticalCenterUV);
color = texture2D(textureSampler, resultUV);
}
I see two issues with your solution. The main issue is that you mix two different spaces. You seem to work in [0,1] texture space by converting the optical center to that space, but you did not adjust focalLenght. The key point is that for such a distortion model, the focal lenght is determined in pixels. However, now a pixel is not 1 base unit wide anymore, but 1/width and 1/height units, respectively.
You could add vec2 focalLengthUV = focalLength / imageSize, but you will see that both divisions will cancel out each other when you calculate lensCoordinates. It is much more convenient to convert the texture space UV coordinates to pixel coordinates and use that space directly:
vec2 lensCoordinates = (UV * imageSize - opticalCenter) / focalLenght;
(and also respectively changing the calculation for distortedUV and resultUV).
There is still one issue with the approach I have sketched so far: the conventions of that pixel space I mentioned earlier. In GL, the origin will be the lower left corner, while in most pixel spaces, the origin is at the top left. You might have to flip the y coordinate when doing the conversion. Another thing is where exactly pixel centers are located. So far, the code assumes that pixel centers are at integer + 0.5. The texture coordinate (0,0) is not the center of the lower left pixel, but the corner point. The parameters you use for the distortion might (I don't know OpenCV's conventions) assume the pixel centers at integers, so that instead of the conversion pixelSpace = uv * imageSize, you might need to offset this by half a pixel like pixelSpace = uv * imageSize - vec2(0.5).
The second issue I see is
float radiusSquared = sqrt(dot(lensCoordinates, lensCoordinates));
That sqrt is not correct here, as dot(a,a) will already give the squared lenght of vector a.

Manually calculate gl_FragCoord

I'm trying to implement a nearest neighbor search for points using OpenGL and GLSL shaders. The NN calculation works correctly and the result is drawn into a texture of size 1024x1024 (using a viewport of the current screen size).
The result simply contains a vec4 holding the position of the neighbor.
Now the important part is:
The texel holding the vec4 is located exactly where the point is projected to (the point for which I am searching for neighbors). So in theory, to access the neighbor of an arbitrary point, I project its world location to screen coordinates and use these to access the texture (e.g. using texelFetch).
This works if I do the point projection in a vertex shader and by using gl_FragCoord to access the texture in my fragment program. But now I have a new situation where the points are only available in the fragment shader (accessed through a texture/buffer), and therefore I have to calculate the screen position manually.
I tried the following to calculate gl_FragCoord on my own, but it doesn't work (blank results only):
vec4 pointPos = ... //texture lookup
vec4 transformedPos = matProjectionOrtho * pointPos;
transformedPos.xy /= transformedPos.w;
transformedPos.xy = transformedPos.xy * 0.5f + 0.5f;
transformedPos.xy = vec2(transformedPos.x * textureWidth,
transformedPos.y * textureHeight);
The projection matrix matProjectionOrtho is the same for all rendering passes, simply an orthogonal projection. textureWidth and textureHeight are the size of the texture holding the neighbor data (usually 1024x1024).
Is this calculation of the screen/texture position correct?
Is this calculation of the screen/texture position correct?
What is your viewport? That looks proper assuming the viewport has the same size as your texture (which you have already stated) and critically, contains no offset (e.g. its origin is 0,0).
The only really iffy thing here is that to use texelFetch (...) you need integer coordinates, transformedPos is a floating-point vector. GLSL does not define implicit conversion from vecN to ivecN, so you cannot use the coordinates you just calculated directly - you will have to construct an ivec yourself.
Something to the effect:
ivec2 texel_coords = ivec2 (transformedPos.x, transformedPos.y);
Fortunately because texels are centered at i+0.5 rather than i+0.0, when you convert the coordinates from floating-point to integer, the fact that they are truncated turns out not to matter in this case. That is to say pixel coordinate 511.9 is obviously closer to 512 than 511. If texels were centered on integer boundaries, then the fact that 511.9 becomes 511 when converted to an integer would really mess with things when you try to find the nearest neighbor.

Using shaders to implement field of view on a 2D environment

I'm implementing dynamic field of view. I decided to use shaders in order to make the illumination better looking and how it affects the walls. Here is the scenario I'm working on:
http://i.imgur.com/QxZVyo7.jpg
I have a map, with a flat floor and walls. Every thing here is 2d, there is no 3d geometry, only 2d polygons that compose the walls.
Using the vertex of the polygons I cast shadows, to define the viewable area. (The purple lines are part of the mask I use in the next step)
Using the shader when drawing the shadows on top of the scenario, I avoid the walls to be also obscured.
This way the shadows are cast dynamically along the walls as the field of view changes
I have used the following shader to achieve this. But I feel this is kind of a overkill and really really unefficient:
uniform sampler2D texture;
uniform sampler2D filterTexture;
uniform vec2 textureSize;
uniform float cellSize;
uniform sampler2D shadowTexture;
void main()
{
vec2 position;
vec4 filterPixel;
vec4 shadowPixel;
vec4 pixel = texture2D(texture, gl_TexCoord[0].xy );
for( float i=0 ; i<=cellSize*2 ; i++)
{
position = gl_TexCoord[0].xy;
position.y = position.y - (i/textureSize.y);
filterPixel = texture2D( filterTexture, position );
position.y = position.y - (1/textureSize.y);
shadowPixel = texture2D( texture, position );
if (shadowPixel == 0){
if( filterPixel.r == 1.0 )
{
if( filterPixel.b == 1.0 ){
pixel.a = 0;
break;
}
else if( i<=cellSize )
{
pixel.a = 0;
break;
}
}
}
}
gl_FragColor = pixel;
}
Iterating for each frament just to look for the red colored pixel in the mask seems like a huge overload, but I fail to see how to complete this taks in any other way by using shaders.
The solution here is really quite simple: use shadow maps.
Your situation may be 2D instead of 3D, but the basic concept is the same. You want to "shadow" areas based on whether there is an obstructive surface between some point in the world and a "light source" (in your case, the player character).
In 3D, shadow maps work by rendering the world from the perspective of the light source. This results in a 2D texture where the values represent the depth from the light (in a particular direction) to the nearest obstruction. When you render the scene for real, you check the current fragment's location by projecting it into the 2D depth texture (the shadow map). If the depth value you compute for the current fragment is closer than the nearest obstruction in the projected location in the shadow map, then the fragment is visible from the light. If not, then it isn't.
Your 2D version would have to do the same thing, only with one less dimension. You render your 2D world from the perspective of the "light source". Your 2D world in this case is really just the obstructing quads (you'll have to render them with line polygon filling). Any quads that obstruct sight should be rendered into the shadow map. Texture accesses are completely unnecessary; the only information you need is depth. Your shader doesn't even have to write a color. You render these objects by projecting the 2D space into a 1D texture.
This would look something like this:
X..X
XXXXXXXX..XXXXXXXXXXXXXXXXXXXX
X.............\.../..........X
X..............\./...........X
X...............C............X
X............../.\...........X
X............./...\..........X
X............/.....\.........X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
C is the character's position; the dots are just regular, unobstructive floor. The Xs are the walls. The lines from C represent the four directions you need to render the 2D lines from.
In 3D, to do shadow mapping for point lights, you have to render the scene 6 times, in 6 different directions into the faces of a cube shadow map. In 2D, you have to render the scene 4 times, in 4 different directions into 4 different 1D shadow maps. You can use a 1D array texture for this.
Once you have your shadow maps, you just use them in your shader to detect when a fragment is visible. To do that, you'll need a set of transforms from window space into the 4 different projections that represent the 4 directions of view that you rendered into. Only one of these will be used for any particular fragment, based on where the fragment is relative to the target.
To implement this, I'd start with just getting a simple case of directional "shadowing" to work. That is, don't use a position; just a direction for a "light". That will test your ability to develop a 2D-to-1D projection matrix, as well as an appropriate camera-space matrix to transform your world-space quads into camera space. Once you have mastered that, then you can get to work doing it 4 times with different projections.

C++/OpenGL convert world coords to screen(2D) coords

I am making a game in OpenGL where I have a few objects within the world space. I want to make a function where I can take in an object's location (3D) and transform it to the screen's location (2D) and return it.
I know the the 3D location of the object, projection matrix and view matrix in the following varibles:
Matrix projectionMatrix;
Matrix viewMatrix;
Vector3 point3D;
To do this transform, you must first take your model-space positions and transform them to clip-space. This is done with matrix multiplies. I will use GLSL-style code to make it obvious what I'm doing:
vec4 clipSpacePos = projectionMatrix * (viewMatrix * vec4(point3D, 1.0));
Notice how I convert your 3D vector into a 4D vector before the multiplication. This is necessary because the matrices are 4x4, and you cannot multiply a 4x4 matrix with a 3D vector. You need a fourth component.
The next step is to transform this position from clip-space to normalized device coordinate space (NDC space). NDC space is on the range [-1, 1] in all three axes. This is done by dividing the first three coordinates by the fourth:
vec3 ndcSpacePos = clipSpacePos.xyz / clipSpacePos.w;
Obviously, if clipSpacePos.w is zero, you have a problem, so you should check that beforehand. If it is zero, then that means that the object is in the plane of projection; it's view-space depth is zero. And such vertices are automatically clipped by OpenGL.
The next step is to transform from this [-1, 1] space to window-relative coordinates. This requires the use of the values you passed to glViewport. The first two parameters are the offset from the bottom-left of the window (vec2 viewOffset), and the second two parameters are the width/height of the viewport area (vec2 viewSize). Given these, the window-space position is:
vec2 windowSpacePos = ((ndcSpacePos.xy + 1.0) / 2.0) * viewSize + viewOffset;
And that's as far as you go. Remember: OpenGL's window-space is relative to the bottom-left of the window, not the top-left.