A virtual 3D world maps points in space to a 3-dimensional grid along x-, y- and z-axes. Suppose a perspective camera looks into a virtual world from the point (0, 0, 0), looking down the positive z-axis in a left-handed system. Imagine that the far plane is 2 units wide and 2 units high, and that its z coordinate is 1. This means that the point in top left of the scene is at the position (1, 1, 1).
The horizontal distance to the point directly in front of the camera, at (0, 0, 1) is 1 unit, but the horizontal distance to the far right corner, at (1, 0, 1) is a little over 1.414 units. To maintain the same sense of perspective as when viewing directly ahead, a point at this distance at a height of 1 units should appear to be 0.707 units above the horizon. Instead, it appears exactly 1 unit above the horizon.
If you turn the camera to point directly at the point at (1, 0, 1), the vertical dimensions of objects in that direction are reduced: the scene appears to distort. When the field of view is fairly wide, this can lead to a feeling of nausea when the camera needs to turn a lot.
Is my understanding correct? If so, is there a way to correct this aberration at the corners of an OpenGL/WebGL scene? Or is it something that users have learned to live with?
Related
Often, we see the following picture when talking about ray tracing.
Here, I see the Z axis as the sort of direction if the camera pointed straight ahead, and the XY grid as the grid that the camera is seeing here. From the camera's point of view, we see the usual Cartesian grid me and my classmates are used to.
Recently I was examining code that simulates this. One thing that is not obvious from this picture to me is the requirement for the "right" and "down" vectors. Obviously we have look_at, which shows where the camera is looking. And campos is where the camera is located. But why do we need camright and camdown? What are we trying to construct?
Vect X (1, 0, 0);
Vect Y (0, 1, 0);
Vect Z (0, 0, 1);
Vect campos (3, 1.5, -4);
Vect look_at (0, 0, 0);
Vect diff_btw (
campos.getVectX() - look_at.getVectX(),
campos.getVectY() - look_at.getVectY(),
campos.getVectZ() - look_at.getVectZ()
);
Vect camdir = diff_btw.negative().normalize();
Vect camright = Y.crossProduct(camdir);
Vect camdown = camright.crossProduct(camdir);
Camera scene_cam (campos, camdir, camright, camdown);
I was searching about this question recently and found this post as well: Setting the up vector in the camera setup in a ray tracer
Here, the answerer says this: "My answer assumes that the XY plane is the "ground" in world space and that Z is the altitude. Imagine standing on the floor holding a camera. It's position has a positive Z component, and it's view vector is nearly parallel to the ground. (In my diagram, it's pointing slightly upwards.) The plane of the camera's film (the uv grid) is perpendicular to the view grid. A common approach is to transform everything until the film plane coincides with the XY plane and the camera points along the Z axis. That's equivalent, but not what I'm describing here."
I'm not entirely sure why "transformations" are necessary.. How is this point of view different from the picture at the top? Here they also say that they need an "up" vector and "right" vector to "construct an image plane". I'm not sure what an image plane is..
Could someone explain better the relationship between the physical representation and code representation?
How do you know that you always want the camera's "up" to be aligned with the vertical lines in the grid in your image?
Trying to explain it another way: The grid is not really there. That imaginary grid is the result of the calculations of camera's directional vectors and the resolution you are rendering in. The grid is not what decides the camera angle.
When you are holding a physical camera in your hand, like the camera in the cell phone, don't you ever rotate the camera little bit for effect? Or when filming, you may want to slowly rotate the camera? Have you not seen any movies where the camera is rotated?
In the same way, you may want to rotate the "camera" in your ray traced world. And rotating only the camera is much easier than rotating all your objects in the scene(may be millions!)
Check out the example of rotating the camera from the movie Ice Age here:
https://youtu.be/22qniGtZhZ8?t=61
The (up or down) and right vectors constructs the plane you project the scene onto. Since the scene is in 3D you need to project the scene onto a 2D scene in order to render a picture to display on your screen.
If you have the camera position and direction you still don't know whether you're holding the camera right-side up, upside down, or tilted to the left and right.
Using camera position, lookat, up (down) and right vectors we can uniquely define the 3D scene is projected into a 2D picture.
Concretely, if you look at the code and the picture. The 3D scene are the objects displayed. The image/projection plane is the grid infront of the camera. It's orientation is defined by the the camright and camdir vectors (because we are assuming the cameras line of sight is perpendicular to camdir, camdown is uniquely defined by the other two).
The placement of the grid is based on the camera's position and intrinsic properties (it's not being displayed here, but the camera will have a specific field of view).
According to a number sources NDC differs from clip space in that NDC is just clip space AFTER division by the W component. Primitives are clipped in clip space, which in OpenGL is -1 to 1 along X, Y, and Z axes (Edit: this is wrong, see answer). In other words, clip space is a cube. Clipping is done within this cube. If it falls inside, it's visible, if it falls outside, it's not visible.
So let's take this simple example, we're looking from the top down on a viewing frustum, down the negative Y axis. The HALFFOV is 45 degrees, which means the NEAR and the RIGHT are both the same (in this case length 2). The example point is (6, 0, -7).
Now, here is the perspective projection matrix:
For simplicity we'll use an aspect ratio of 1:1. So:
RIGHT = 2
LEFT = -2
TOP = 2
BOTTOM = -2
NEAR = 2
FAR = 8
So filling in our values we get a projection matrix of:
Now we add the homogenous W to our point, which was (6, 0, -7), and get get (6, 0, -7, 1).
Now we multiply our matrix with our point, which results in (6, 0, 6.29, 7).
This point now (the point after being multiplied by the projection matrix, is supposed to lie in "clip space". Supposedly the clipping is done at this stage, figuring out whether a point lies inside or outside the clipping cube, and supposedly BEFORE division with W. Here is how it looks in "clip space":
From the sources I've seen the clipping is done at this stage, as it looks as above, BEFORE dividing by W. If you divide by W NOW, the point ends up in the right area of the clip space cube. This is why I don't understand why everyone says that perspective division is done AFTER the clipping space. In this space, prior to perspective division the point lies completely outside and would be judged to be outside the clipping space, and not visible. However after the perspective division, division by W, here is how it looks:
Now the point lies within the clip space cube, and can be judged to be inside, and visible. This is why I think perspective division is done BEFORE clipping, because if clipping space is in -1 to +1 in each axis, and the clipping stage checks against these dimensions, for a point to be inside this cube it must have already undergone division by W, otherwise almost ANY point lies outside the clipping space cube and is never visible.
So why does everyone say that first comes clipping space which is a result of the projection matrix, and ONLY then there is perspective division (division by W) which results in NDC?
In clip space, clipping is not done against a unit cube. It is done against a cube with side-length w. Points are inside the visible area if each of their x,y,z coordinate is smaller than their w coordinate.
In the example you have, the point [6, 0, 6.29, 7] is visible because all three coordinates (x,y,z) are smaller than 7.
Note, that for points inside the visible area, this is exactly equivalent to testing x/w < 1. The problems start with points in-front of the far-plane since they might get projected to the visible area by the homogeneous divide because their w-value is negative. As we all know, dividing by a negative number in an inequality would switch the operator, which is impracticable on hardware.
Further readings:
OpenGL sutherland-hodgman polygon clipping algorithm in homogeneous coordinates
Why clipping should be done in CCS, not NDCS
Why does GL divide gl_Position by W for you rather than letting you do it yourself?
I am confused about the position of objects in opengl .The eye position is 0,0,0 , the projection plane is at z = -1 . At this point , will the objects be in between the eye position and and the plane (Z =(0 to -1)) ? or its behind the projection plane ? and also if there is any particular reason for being so?
First of all, there is no eye in modern OpenGL. There is also no camera. There is no projection plane. You define these concepts by yourself; the graphics library does not give them to you. It is your job to transform your object from your coordinate system into clip space in your vertex shader.
I think you are thinking about projection wrong. Projection doesn't move the objects in the same sense that a translation or rotation matrix might. If you take a look at the link above, you can see that in order to render a perspective projection, you calculate the x and y components of the projected coordinate with R = V(ez/pz), where ez is the depth of the projection plane, pz is the depth of the object, V is the coordinate vector, and R is the projection. Almost always you will use ez=1, which makes that equation into R = V/pz, allowing you to place pz in the w coordinate allowing OpenGL to do the "perspective divide" for you. Assuming you have your eye and plane in the correct places, projecting a coordinate is almost as simple as dividing by its z coordinate. Your objects can be anywhere in 3D space (even behind the eye), and you can project them onto your plane so long as you don't divide by zero or invalidate your z coordinate that you use for depth testing.
There is no "projection plane" at z=-1. I don't know where you got this from. The classic GL perspective matrix assumes an eye space where the camera is located at origin and looking into -z direction.
However, there is the near plane at z<0 and eveything in front of the near plane is going to be clipped. You cannot put the near plane at z=0, because then, you would end up with a division by zero when trying to project points on that plane. So there is one reasin that the viewing volume isn't a pyramid with they eye point at the top but a pyramid frustum.
This is btw. also true for real-world eyes or cameras. The projection center lies behind the lense, so no object can get infinitely close to the optical center in either case.
The other reason why you want a big near clipping distance is the precision of the depth buffer. The whole depth range between the front and the near plane has to be mapped to some depth value with a limited amount of bits, typically 24. So you want to keep the far plane as close as possible, and shift away the near plane as far as possible. The non-linear mapping of the screen-space z coordinate makes this even more important, as that the precision is non-uniformely distributed over that range.
Opengl superbible 4th Edition.page 164
To apply a camera transformation, we take the camera’s actor transform and flip it so that
moving the camera backward is equivalent to moving the whole world forward. Similarly,
turning to the left is equivalent to rotating the whole world to the right.
I can't understand why?
Image yourself placed within a universe that also contains all other things. In order for your viewpoint to appear to move in a forwardly direction, you have two options...
You move yourself forward.
You move everything else in the universe in the opposite direction to 1.
Because you defining everything in OpenGL in terms of the viewer (you're ultimately rendering a 2D image of a particular viewpoint of the 3D world), it can often make more sense from both a mathematical and programatic sense to take the 2nd approach.
Mathematically there is only one correct answer. It is defined that after transforming to eye-space by multiplying a world-space position by the view-matrix, the resulting vector is interpreted relative to the origin (the position in space where the camera conceptually is located relative to the aforementioned point).
What SuperBible states is mathematically just a negation of translation in some direction, which is what you will automatically get when using functions that compute a view-matrix like gluLookAt() or glmLookAt() (although GLU is a lib layered on legacy GL stuff, mathematically the two are identical).
Have a look at the API ref for gluLookAt(). You'll see that the first step is setting up an ortho-normal base of the eye-space which first results in a 4x4 matrix basically only encoding the upper 3x3 rotation matrix. The second is multiplying the former matrix by a translation matrix. In terms of legacy functions, this can be expressed as
glMultMatrixf(M); // where M encodes the eye-space basis
glTranslated(-eyex, -eyey, -eyez);
You can see, the vector (eyex, eyey, eyez) which specifies where the camera is located in world-space is simply multiplied by -1. Now assume we don't rotate the camera at all, but assume it to be located at world-space position (5, 5, 5). The appropriate view-matrix View would be
[1 0 0 -5
0 1 0 -5
0 0 1 -5
0 0 0 1]
Now take a world-space vertex position P = (0, 0, 0, 1) transformed by that matrix: P' = View * P. P' will then simply be P'=(-5, -5, -5, 1).
When thinking in world-space, the camera is at (5, 5, 5) and the vertex is at (0, 0, 0). When thinking in eye-space, the camera is at (0, 0, 0) and the vertex is at (-5, -5, -5).
So in conclusion: Conceptually, it's a matter of how you're looking at things. You can either think of it as transforming the camera relative to the world, or you think of it as transform the world relative to the camera.
Mathematically, and in terms of the OpenGL transformation pipeline, there is only one answer, and that is: the camera in eye-space (or view-space or camera-space) is always at the origin and world-space positions transformed to eye-space will always be relative to the coordinate system of the camera.
EDIT: Just to clarify, although the transformation pipeline and involved vector spaces are well defined, you can still use world-space positions of everything, even the camera, for instance in a fragment shader for lighting computation. The important thing here is to know never to mix entities from different spaces, e.g. don't compute stuff based on a world-space and and eye-space position and so on.
EDIT2: Nowadays, in a time that we all use shaders *cough and roll-eyes*, you're pretty flexible and theoretically you can pass any position you like to gl_Position in a vertex shader (or the geometry shader or tessellation stages). However, since the subsequent computations are fixed, i.e. clipping, perspective division and viewport transformation the resulting position will simply be clipped if its not inside [-gl_Position.w, gl_Position.w] in x, y and z.
There is a lot to this to really get it down. I suggest you read the entire article on the rendering pipeline in the official GL wiki.
I have a sphere in my program and I intend to draw some rectangles over at a distance x from the centre of this sphere. The figure looks something below:
The rectangles are drawn at (x,y,z) points that I have already have in a vector of 3d points.
Let's say the distance x from centre is 10. Notice the orientation of these rectangles and these are tangential to an imaginary sphere of radius 10 (perpendicular to an imaginary line from the centre of sphere to the centre of rectangle)
Currently, I do something like the following:
For n points vector<vec3f> pointsInSpace where the rectnagles have to be plotted
for(int i=0;i<pointsInSpace.size();++i){
//draw rectnagle at (x,y,z)
}
which does not have this kind of tangential orientation that I am looking for.
It looked to me of applying roll,pitch,yaw rotations for each of these rectangles and using quaternions somehow to make them tangential as to what I am looking for.
However, it looked a bit complex to me and I wanted to ask about some better method to do this.
Also, the rectangle in future might change to some other shape, so a kind of generic solution would be appreciated.
I think you essentially want the same transformation as would be accomplished with a LookAt() function (you want the rectangle to 'look at' the sphere, along a vector from the rectangle's center, to the sphere's origin).
If your rectangle is formed of the points:
(-1, -1, 0)
(-1, 1, 0)
( 1, -1, 0)
( 1, 1, 0)
Then the rectangle's normal will be pointing along Z. This axis needs to be oriented towards the sphere.
So the normalised vector from your point to the center of the sphere is the Z-axis.
Then you need to define a distinct 'up' vector - (0,1,0) is typical, but you will need to choose a different one in cases where the Z-axis is pointing in the same direction.
The cross of the 'up' and 'z' axes gives the x axis, and then the cross of the 'x' and 'z' axes gives the 'y' axis.
These three axes (x,y,z) directly form a rotation matrix.
This resulting transformation matrix will orient the rectangle appropriately. Either use GL's fixed function pipeline (yuk), in which case you can just use gluLookAt(), or build and use the matrix above in whatever fashion is appropriate in your own code.
Personally I think the answer of JasonD is enough. But here is some info of the calculation involved.
Mathematically speaking this is a rather simple problem, What you have is a 2 known vectors. You know the position vector and the spheres normal vector. Since the square can be rotated arbitrarily along around the vector from center of your sphere you need to define one more vector, the up vector. Without defining up vector it becomes a impossible solution.
Once you define a up vector vector, the problem becomes simple. Assuming your square is on the XY-plane as JasonD suggest above. Then your matrix becomes:
up_dot_n_dot_n.X up_dot_n_dot_n.Y up_dot_n_dot_n.Z 0
n.X n.y n.z 0
up_dot_n.x up_dot_n.x up_dot_n.z 0
p.x p.y p.z 1
Where n is the normal unit vector of p - center of sphere (which is trivial if sphere is in the center of the coordinate system), up is a arbitrary unit vector vector. The p follows form definition and is the position.
The solution has a bit of a singularity at the up direction of the sphere. An alternate solution is to rotate first 360 around up, the 180 around rotated axis dot up. Produces same thing different approach no singularity problem.