Understanding math: transformation world coordinates to view coordinates - opengl

I am trying to understand the math behind the transformation from world coordinates to view coordinates.
This is the formula to calculate the matrix in view coordinates:
and here is an example, that should normally be correct...:
where b = width of the viewport and h= the height of the viewport
But I just don't know how to calculate the R matrix. How do you get Ux, Uy, Uz, Vx, Vy, etc... ? U,v and, n is the coordinatesystem fixed to the camera. And the camera is at position X0, Y0, Z0.

The matrix T is applied first. It translates some world coordinate P by minus the camera coordinate (call it C), giving the relative coordinate of P (call this Q) with respect to the camera (Q = P - C), in the world axes orientation.
The matrix R is then applied to Q. It performs a rotation to obtain the coordinates of Q in the camera's axes.
u is the horizontal view axis
v is the vertical view axis
n is the view direction axis
(all three should be normalized)
Multiplying R with Q :
multiplying with the first line of R gives DOT(Q, u). This returns the component of Q projected onto u, which is the horizontal view coordinate.
the second line gives DOT(Q, v), which similar to above gives the vertical view coordinate.
the third line gives DOT(Q, n), which is the depth view coordinate.
A diagram:
BTW These are NOT screen/viewport coordinates! They are just the coordinates in the camera/view frame. To get the perspective-corrected coordinate another matrix (the projection matrix) needs to be applied.

Related

How to calculate sphere rotation based off of UV coordinates

I have a texture that I am wrapping around a sphere similar to this image on Wikipedia.
https://upload.wikimedia.org/wikipedia/commons/0/04/UVMapping.png
What I am trying to achieve is to take a UV coordinate from the texture let's say (0.8,0.8) which would roughly be around Russia in the above example.
With this UV somehow calculate the rotation I would need to apply to the sphere to have that UV centred on the sphere.
Could someone point me in the right direction of the math equation I would need to calculate this?
Edit - it was pointed out that I am actually looking for the rotation of the sphere so the uv is centered towards the camera. So starting with the rotation of 0,0,0 my camera is pointed at the uv (0,0.5)
Thanks
This particular type of spherical mapping has the very convenient property that the UV coordinates are linearly proportional to the corresponding polar coordinates.
Let's assume for convenience that:
UV (0.5, 0.5) corresponds to the Greenwich Meridian line / Equator - i.e. (0° N, 0° E)
The mesh is initially axis-aligned
The texture is centered at spherical coordinates (θ, φ) = (π/2, 0) - i.e. the X-axis
A diagram to demonstrate:
Using the boundary conditions:
U = 0 -> φ = -π
U = 1 -> φ = +π
V = 1 -> θ = 0
V = 0 -> θ = π
We can deduce the required equations, and the corresponding direction vector r in the sphere's local space:
Assuming the sphere has rotation matrix R and is centered at c, simply use lookAt with:
Position c + d * (R * r)
Direction -(R * r)

OpenGL Perspective Projection pixel perfect drawing

The target is to draw a shape, lets say a triangle, pixel-perfect (vertices shall be specified in pixels) and be able to transform it in the 3rd dimension.
I've tried it with a orthogonal projection matrix and everything works fine, but the shape doesn't have any depth - if I rotate it around the Y axis it looks like I would just scale it around the X axis. (because a orthogonal projection obviously behaves like this). Now I want to try it with a perspective projection. But with this projection, the coordinate system changes completely, and due to this I can't specify my triangles verticies with pixels. Also if the size of my window changes, the size of shape changes too (because of the changed coordinate system).
Is there any way to change the coordinate system of the perspective projection so that I can specify my vertices like if I would use the orthogonal projection? Or do anyone have a Idea how to achieve the target described in the first sentence?
The projection matrix describes the mapping from 3D points of a scene, to 2D points of the viewport. It transforms from eye space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) by dividing with the w component of the clip coordinates. The NDC are in range (-1,-1,-1) to (1,1,1).
At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport. The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).
Perspective Projection Matrix:
r = right, l = left, b = bottom, t = top, n = near, f = far
2*n/(r-l) 0 0 0
0 2*n/(t-b) 0 0
(r+l)/(r-l) (t+b)/(t-b) -(f+n)/(f-n) -1
0 0 -2*f*n/(f-n) 0
where:
aspect = w / h
tanFov = tan( fov_y * 0.5 );
prjMat[0][0] = 2*n/(r-l) = 1.0 / (tanFov * aspect)
prjMat[1][1] = 2*n/(t-b) = 1.0 / tanFov
I assume that the view matrix is the identity matrix, and thus the view space coordinates are equal to the world coordinates.
If you want to draw a polygon, where the vertex coordinates are translated 1:1 into pixels, then you have to draw the polygon in parallel plane to the viewport. This means all points have to be draw with the same depth. The depth has to choose that way, that the transformation of a point in normalized device coordinates, by the inverse projection matrix gives the vertex coordinates in pixel. Note, the homogeneous coordinates given by the transformation with the inverse projection matrix, have to be divided by the w component of the homogeneous coordinates, to get cartesian coordinates.
This means, that the depth of the plane depends on the field of view angle of the projection:
Assuming you set up a perspective projection like this:
float vp_w = .... // width of the viewport in pixel
float vp_h = .... // height of the viewport in pixel
float fov_y = ..... // field of view angle (y axis) of the view port in degrees < 180°
gluPerspective( fov_y, vp_w / vp_h, 1.0, vp_h*2.0f );
Then the depthZ of the plane with a 1:1 relation of vertex coordinates and pixels, will be calculated like this:
float angRad = fov_y * PI / 180.0;
float depthZ = -vp_h / (2.0 * tan( angRad / 2.0 ));
Note, the center point of the projection to the view port is (0,0), so the bottom left corner point of the plane is (-vp_w/2, -vp_h/2, depthZ) and the top right corner point is (vp_w/2, vp_h/2, depthZ). Ensure, that the near plane of the perspective projetion is less than -depthZ and the far plane is greater than -depthZ.
See further:
Both depth buffer and triangle face orientation are reversed in OpenGL
Transform the modelMatrix

Convert Screen Distance to Eye Space Ray Distance

I have a ray with an origin (x,y,z) and a direction (dx, dy, dz) given in homogeneous eye space coordinates:
p = (x,y,z,1) + t * (dx, dy, dz, 0)
What I need to calculate is a positive value for t that for a given pixel distance n results in a point n pixels away from the screen projection of (x,y,z). How can I achieve this?
Regards
Projecting a point into a plane is obtaining the intersection with this plane of the ray through the point and with the direction of the view. Let's say that view direction is vector v.
Put the origin of your ray (let's call it O{x,y,z}) in the line from the camera perpendicular to the near plane of projection. Let's call its projection P. Then a second point (that you expressed as S=O+t·d) will project into the near plane at point T. You need 't' that makes distance PT = n.
If you do the cross product c = vxd you get a distance in near plane. Remember vxd= |v||d|sin(a). If both v and d are normalized that distance is the sin of the angle between v and d.
If d is not normalized (dnn), i.e. |d|=distance(O,S), then the distance between O and S after projection on near plane is k= cross(dnn,d) which is = distance(O,S)·cross(v,d) = t·cross(v,d). Making the required value, n, same as k n= k results in t= n / cross(v,d) with both v,d normalized.
Using pixels instead of values in the near plane is just a matter of scaling properly with window sizes.

3D rendering in OpenGL: model/view/projection vs translation/rotation/camera matrices

I want to add to a captured frame from a camera a mesh model (Let's say a cube)
I also know all the information about where to put the cube:
Translation matrix - relative to the camera
Rotation matrix - relative to the camera
camera calibration matrix - focal length, principal point, etc. (intrinsic parameters)
How can I convert this information to model/view/projection matrices?
What should be the values to set to these matrices?
For example, let's say that I want to display the point [x, y, z, 1] on the screen,
then that should be something like: [u, v, 1] = K * [R | T] * [x, y, z, 1], while:
u, v are the coordinates in the screen (or camera capture) and:
K, R and T are intrinsic camera parameters, rotation and translation, respectively.
How to convert K, R, T to model/view/projection matrices?
[R | T] would be your model-view matrix and K would be your projection matrix.
Model-view matrix is usually one matrix. The separation is only conceptual: Model translates from model coordinates to world coordinates and View from world coordinates to camera (not-yet-projected) coordinates. It makes sense in applications where the camera and the objects move independently from each other. In your case, on the other hand, camera can be considered fixed and everything else described relative to the camera. So you have to deal only with two matrices: model-view and projection.
Assuming that your camera intrinsic matrix comes from OpenCV (http://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html), how to initialize your OpenGL projection matrix is described there:
https://blog.noctua-software.com/opencv-opengl-projection-matrix.html
cx, cy, width and height are in pixels
As for your OpenGL model-view matrix, it's really simple:
So in the end your model-view-projection matrix is:

Converting Screen 2D to World 3D Coordinates

I want to convert 2D screen coordinates to 3D world coordinates. I have searched a lot but I did not get any satisfying result.
Note: I am not using OpenGL nor any other graphics library.
Data which I have:
Screen X
Screen Y
Screen Height
Screen Width
Aspect Ratio
If you have the Camera world Matrix and Projection Matrix this is pretty simple.
If you don't have the world Matrix you can compute it from it's position and rotation.
worldMatrix = Translate(x, y, z) * RotateZ(z_angle) * RotateY(y_angle) * RotateX(x_angle);
Where translate returns the the 4x4 translation matrices and Rotate returns the 4x4 rotation matrices around the given axis.
The projection matrix can be calculated from the aspect ratio, field of view angle, and near and far planes.
This blog has a good explanation of how to calculate the projection matrix.
You can unproject the screen coordinates by doing:
mat = worldMatrix * inverse(ProjectionMatrix)
dir = transpose(mat) * <x_screen, y_screen, 0.5, 1>
dir /= mat[3] + mat[7] + mat[11] + mat[15]
dir -= camera.position
Your ray will point from the camera in the direction dir.
This should work, but it's not a super concreate example on how to do this.
Basically you just need to do the following steps:
calculate camera's worldMatrix
calculate camera's projection matrix
multiply worldMatrix with inverse projection matrix.
create a point <Screen_X_Value, Screen_Y_Value, SOME_POSITIVE_Z_VALUE, 1>
apply this "inverse" projection to your point.
then subtract the cameras position form this point.
The resulting vector is the direction from the camera. Any point along that ray are the 3D coordinates corresponding to your 2D screen coordinate.