How to manage the perspective transformation? - opengl

How to convert (x,y,z) coordinates from inside the perspective pyramid, to (x',y',z') coordinates inside the perspective cube? (in a right hand coordinate system)
I tried to multiply this perspective matrix with the (x,y,z) vector, but the result isn't what I expected.
I tried it with: fov=70°, aspect=4/3, near=100, far=100;
x=100, y=100, z=-300;
The result was (158.28, 211.05, -344.44)
All I want is this:
Thanks in advance,

Though a perspective matrix generally transforms space such that the desired view frustum maps to a canonical volume (could be a unit cube, but not all graphics pipelines are the same - for example, D3D is different to OpenGL), this volume is described in homogeneous (projective) coordinates. This is because the actual projection is a non-linear transform, but using a projective coordinate system allows the use of linear transformations for the bulk of the pipeline.
So you still need to perform the projection, if you want a point in 3D (or 2D) space.
This is simply a divide.
When you multiply a point (x, y, z, 1) by a perspective matrix, you get a vector-4 (x', y', z', w'). You then need to divide x', y' and z' by w' to do the projection.

Related

Why depth values must be interpolated directly by barycentric coordinates in openGL?

OpenGL spec:
It says: However, depth values for polygons must be interpolated by (14.10).
Why? Are the z coordinates depth values in camera space? If so, we should use perspective correctly barycentric coordinates to interpolate them, isn't it?(like equation 14.9)
Update:
So the z coordinates are NDC coordinates(which already divided by w). I have a small demo which implement a rasterizer. When I use linear interpolation of the NDC z coordinates, the result is a bit unusual(image below). While I use perspective correctly interpolation of camera z coordinates, the result is ok.
This is the perspective projection matrix I use:
Why? Are the z coordinates depth values in camera space? If so, we should use perspective correctly barycentric coordinates to interpolate them, isn't it?
No, they are not. They are in window space, meaning they already have been divided by w. It is correct that if you wanted to interpolate camrea space z, you would have to apply perspective correction. But for NDC and window space Z this would be wrong - after all, the perspective transformation (as achieved by perspective projection matrix and perspective divide) still maps straight lines to straight lines, and flat trinagles to flat triangles. That's why we use the hyperbolically distorted Z values as depth in the first place. This is also a property that is exploited for the hierarchical depth test optimization. Have a look at my answer here for some more details, including a few diagrams.

When perspective division is necessary?

I'm very confused when it is necessary for a homogeneous coordinate (x, y, z, w) get divided by w (ie. converting to (x/w, y/w, z/w, 1)).
According to this page, when a homogeneous vertex coordinate is passed through a orthographical or projective matrix, the coordinate will be in clip coordinates. Perspective division is necessary for clip coordinates to get NDC. It will produce x, y, and z values in range (-1, 1).
I was following WebGL tutorial pages on orthographical and perspective projection. These tutorials didn't mention a single word about perspective division. I'm not sure if the division is still necessary after multiplying with projective matrix. Perhaps, the division is performed automatically during matrix multiplication?
The perspective divide (the conversion from clip space to NDC space) is neither necessary nor unnecessary; it is a part of the graphics pipeline. It happens automatically to every vertex that passes through the system.
You can make it into a no-op by setting a vertex's W component to 1.0 (which is what the orthographic projection matrix does, assuming the input position had a W of 1.0). But the division always happens.

What is the role of gl_Position.w in Vulkan?

Variable gl_Position output from a GLSL vertex shader must have 4 coordinates. In OpenGL, it seems w coordinate is used to scale the vector, by dividing the other coordinates by it. What is the purpose of w in Vulkan?
Shaders and projections in Vulkan behave exactly the same as in OpenGL. There are small differences in depth ranges ([-1, 1] in OpenGL, [0, 1] in Vulkan) or in the origin of the coordinate system (lower-left in OpenGL, upper-left in Vulkan), but the principles are exactly the same. The hardware is still the same and it performs calculations in the same way both in OpenGL and in Vulkan.
4-component vectors serve multiple purposes:
Different transformations (translation, rotation, scaling) can be
represented in the same way, with 4x4 matrices.
Projection can also be represented with a 4x4 matrix.
Multiple transformations can be combined into one 4x4 matrix.
The .w component You mention is used during perspective projection.
All this we can do with 4x4 matrices and thus we need 4-component vectors (so they can be multiplied by 4x4 matrices). Again, I write about this because the above rules apply both to OpenGL and to Vulkan.
So for purpose of the .w component of the gl_Position variable - it is exactly the same in Vulkan. It is used to scale the position vector - during perspective calculations (projection matrix multiplication) original depth is modified by the original .w component and stored in the .z component of the gl_Position variable. And additionally, original depth is also stored in the .w component. After that (as a fixed-function step) hardware performs perspective division and divides position stored in the gl_Position variable by its .w component.
In orthographic projection steps performed by the hardware are exactly the same, but values used for calculations are different. So the perspective division step is still performed by the hardware but it does nothing (position is dived by 1.0).
gl_Position is a Homogeneous coordinates. The w component plays a role at perspective projection.
The projection matrix describes the mapping from 3D points of the view on a scene, to 2D points on the viewport. It transforms from eye space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) by dividing with the w component of the clip coordinates (Perspective divide).
At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport. The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).
Perspective Projection Matrix:
r = right, l = left, b = bottom, t = top, n = near, f = far
2*n/(r-l) 0 0 0
0 2*n/(t-b) 0 0
(r+l)/(r-l) (t+b)/(t-b) -(f+n)/(f-n) -1
0 0 -2*f*n/(f-n) 0
When a Cartesian coordinate in view space is transformed by the perspective projection matrix, then the the result is a Homogeneous coordinates. The w component grows with the distance to the point of view. This cause that the objects become smaller after the Perspective divide, if they are further away.
In computer graphics, transformations are represented with matrices. If you want something to rotate, you multiply all its vertices (a vector) by a rotation matrix. Want it to move? Multiply by translation matrix, etc.
tl;dr: You can't describe translation along the z-axis with 3D matrices and vectors. You need at least 1 more dimension, so they just added a dummy dimension w. But things break if it's not 1, so keep it at 1 :P.
Anyway, now we begin with a quick review on matrix multiplication:
You basically put x above a, y above b, z above c. Multiply the whole column by the variable you just moved, and sum up everything in the row.
So if you were to translate a vector, you'd want something like:
See how x and y is now translated by az and bz? That's pretty awkward though:
You'd have to account for how big z is whenever you move things (what if z was negative? You'd have to move in opposite directions. That's cumbersome as hell if you just want to move something an inch over...)
You can't move along the z axis. You'll never be able to fly or go underground
But, if you can make sure z = 1 at all times:
Now it's much clearer that this matrix allows you to move in the x-y plane by a, and b amounts. Only problem is that you're conceptually levitating all the time, and you still can't go up or down. You can only move in 2D.
But you see a pattern here? With 3D matrices and 3D vectors, you can describe all the fundamental movements in 2D. So what if we added a 4th dimension?
Looks familiar. If we keep w = 1 at all times:
There we go, now you get translation along all 3 axis. This is what's called homogeneous coordinates.
But what if you were doing some big & complicated transformation, resulting in w != 1, and there's no way around it? OpenGL (and basically any other CG system I think) will do what's called normalization: divide the resultant vector by the w component. I don't know enough to say exactly why ('cause scaling is a linear transformation?), but it has favorable implications (can be used in perspective transforms). Anyway, the translation matrix would actually look like:
And there you go, see how each component is shrunken by w, then it's translated? That's why w controls scaling.

Coordinate System [-1, 1]

I am confused on how the OpenGL coordinate system works. I know you start with object coordinates -- everything defined in its own system. Then by applying a matrix, the coordinates change to world coordinates. By applying another matrix, you have view coordinates. Then if you're working in 3D, you can apply a perspective matrix. In the end, you are left with a set of coordinates which likely are not from [-1, 1]. How does OpenGL know how to normalize them from [-1, 1]? How does it know what to clip them out? In the shader, glPosition is just given your coordinates, it doesn't know that there have been through several transformations. I know that a view to normalized coordinate matrix involves a translation and a scale, but we never explicitly make a matrix for that in OpenGL. Does OpenGL use its own hidden matrix to translate from coordinates passed to glPostion to normalized coordinates?
Deprecated fixed function vertex transformations are explained in https://www.opengl.org/wiki/Vertex_Transformation
Shader based rendering is likely to use same or very similar math for each transformation step. The missing step between glPosition and device coordinates is perspective divide (like LJ commented quickly) where xyzw coordinates are converted to xyz coordinates. xyzw coordinates are homogeneous coordinates for 3-dimensional coordinates that use 4 components to represent a location.
https://en.wikipedia.org/wiki/Homogeneous_coordinates

Perspective Projection - OpenGL

I am confused about the position of objects in opengl .The eye position is 0,0,0 , the projection plane is at z = -1 . At this point , will the objects be in between the eye position and and the plane (Z =(0 to -1)) ? or its behind the projection plane ? and also if there is any particular reason for being so?
First of all, there is no eye in modern OpenGL. There is also no camera. There is no projection plane. You define these concepts by yourself; the graphics library does not give them to you. It is your job to transform your object from your coordinate system into clip space in your vertex shader.
I think you are thinking about projection wrong. Projection doesn't move the objects in the same sense that a translation or rotation matrix might. If you take a look at the link above, you can see that in order to render a perspective projection, you calculate the x and y components of the projected coordinate with R = V(ez/pz), where ez is the depth of the projection plane, pz is the depth of the object, V is the coordinate vector, and R is the projection. Almost always you will use ez=1, which makes that equation into R = V/pz, allowing you to place pz in the w coordinate allowing OpenGL to do the "perspective divide" for you. Assuming you have your eye and plane in the correct places, projecting a coordinate is almost as simple as dividing by its z coordinate. Your objects can be anywhere in 3D space (even behind the eye), and you can project them onto your plane so long as you don't divide by zero or invalidate your z coordinate that you use for depth testing.
There is no "projection plane" at z=-1. I don't know where you got this from. The classic GL perspective matrix assumes an eye space where the camera is located at origin and looking into -z direction.
However, there is the near plane at z<0 and eveything in front of the near plane is going to be clipped. You cannot put the near plane at z=0, because then, you would end up with a division by zero when trying to project points on that plane. So there is one reasin that the viewing volume isn't a pyramid with they eye point at the top but a pyramid frustum.
This is btw. also true for real-world eyes or cameras. The projection center lies behind the lense, so no object can get infinitely close to the optical center in either case.
The other reason why you want a big near clipping distance is the precision of the depth buffer. The whole depth range between the front and the near plane has to be mapped to some depth value with a limited amount of bits, typically 24. So you want to keep the far plane as close as possible, and shift away the near plane as far as possible. The non-linear mapping of the screen-space z coordinate makes this even more important, as that the precision is non-uniformely distributed over that range.