creating my own projection matrix in opengl - opengl

Here is my reasoning:
openGL draws everything within a 2x2x2 cube
the x,y values inside this cube determine where the point is drawn on the screen. The z value is used for other stuff...
if you want the z value to have some effect on perspective you need to mutate the scene (usually with a matrix) so that it gives an illusion of distant objects being smaller.
the z values of the cube go from -1 to 1.
Now I want it so that objects that are at z = 1 are infinitely zoomed, and objects that are at z = 0 are normal size, and objects that are at z = -1 are 1/2 size.
When I say an object is zoomed, I mean that the (x,y) coordinates of its points are multiplied by scaler zoom factor, which is based on its z coordinate.
If a point lies outside the 2x2x2 cube I want the calculations to still be done on it if it is between z = 1 and z = -1. Since the z value doesn't change I don't care what happens to any points that are not within this range, as long as their z value is not changed.
Generalized point transformation:
If I have a point P = (x, y, z), and -1 <= z <= 1 then:
the Zoom Factor, S = 1 / (1 - z)
so the translation is as follows:
(x, y, z) ==> (x * S, y * S, z)
Creating the matrix?
This is where I am having issues. I don't know how to create a matrix so that it will transform a generalized point to have the desired effect.
I am considering not using a matrix and applying this transformation via a function in glsl...
If someone has insight on how to create such a matrix I would like to know.

Related

How can I iterate a coordinate sphere using an expanding spherical sector (cone)?

Given an integer 3D coordinate system, a center point P, a vector in some direction V, and a max sphere radius R:
I want to iterate over only integer points in a fashion that starts at P and goes along direction V until reaching the max radius R.
Then, for some small angle T iterate all points within the cone (or spherical sector) around V.
Incrementally expand T until T is pi/2 radians and every point within the sphere has been iterated.
I need to do this with O(1) space complexity. So the order of the points can't be precomputed/sorted but must result naturally from some math.
Example:
// Vector3 represents coordinates x, y, z
// where (typically) x is left/right, y is up/down, z is depth
Vector3 center = Vector3(0, 0, 0); // could be anything
Vector3 direction = Vector3(0, 100, 0); // could be anything
int radius = 4;
double piHalf = acos(0.0); // half of pi
std::queue<Vector3> list;
for (double angle = 0; angle < piHalf; angle+= .1)
{
int x = // confusion begins here
int y = // ..
int z = // ..
list.push(Vector3(x, y, z));
}
See picture for this example
The first coordinates that should be caught are:
A(0,0,0), C(0,1,0), D(0,2,0), E(0,3,0), B(0,4,0)
Then, expanding the angle somewhat (orange cone):
K(-1,0,3), X(1,0,3), (0,1,3), (0,-1,3)
Expanding the angle a bit more (green cone):
F(1,1,3), (-1,-1,3), (1,-1,3) (-1,1,3)
My guess for what would be next is:
L(1,0,2), (-1,0,2), (0,1,2), (0,-1,2)
M(2,0,3) would be hit somewhat after
Extra notes and observations:
A cone will hit a max of four points at its base, if the vector is perpendicular to an axis and originates at an integer point. It may also hit points along the cone wall depending on the angle
I am trying to do this in c++
I am aware of how to check whether a point X is within any given cone or spherical vector by comparing the angle between V and PX with T and am currently using this knowledge for a lesser solution.
This is not a homework question, I am working on a 3D video game~
iterate all integer positions Q in your sphere
simple 3x nested for loops through x,y,z in range <P-R,P+R> will do. Just check inside sphere so
u=(x,y,z)-P;
dot(u,u) <= R*R
test if point Q is exactly on V
simply by checking angle between PQ and V by dot product:
u = Q-P
u = u/|u|
v = V/|V|
if (dot(u,v)==1) point Q is on V
test if points is exactly on surface of "cone"
simply by checking angle between PQ and V by dot product:
u = Q-P
u = u/|u|
v = V/|V|
if (dot(u,v)==cos(T/2)) point Q is on "cone"
where I assume T is full "cone" angle not the half one.
Beware you need to use floats/double for this and make the comparison with some margin for error like:
if (fabs(dot(u,v)-1.0 )<1e-6) point Q is on V
if (fabs(dot(u,v)-cos(T/2))<1e-6) point Q is on "cone"

glRotate(angle,x,y,z), what is x,y,z in this case?

I've read the documentation here: http://www.opengl.org/sdk/docs/man2/xhtml/glRotate.xml
It specifies that angle is in degrees. It says that X,Y,Z are vectors. If I say glRotate(2,1,0,0) that says I will rotate 2 degrees about the X axis.
What happens if I say glRotate(2,0.5,0,0) and glRotate(2,0.0174524,0,0)
I don't understand what's really happening in that situation, can someone help explain to me?
Does it rotate as a percentage of the angle?
It will still rotate 2 degrees about the X axis. That page you linked also says the following:
x y z = 1 (if not, the GL will normalize this vector).
Meaning the vector (x,y,z) is a unit vector (of length 1), and if it's not, GL will normalize the vector (dividing it by its length, making it of length 1).
Conclusion: the x,y and z parameters define a vector, of which the direction is the only relevant part, the length will be dealt with by the function. Thus you can safely put in any vector and it will simply rotate about that vector.
It doesn't say that x, y and z are vectors. If you open the page with a MathML-capable browser, you'll see something like
glRotate produces a rotation of angle degrees around the vector (x,y,z).
I.e. x, y, z are components of a single vector. Similarly, it doesn't say "x y z = 1 (if not, the GL will normalize this vector)": instead, it says:
||(x,y,z)||=1 (if not, the GL will normalize this vector).
So, (x,y,z) is the vector, rotation around which the function will produce. If the vector you supply is not normalized, the GL will normalize it, so glRotate(2,0.5,0,0) and glRotate(2,0.0174524,0,0) are equivalent.
glRotate means you can rotate current matrix by a given vector (x, y, z) for angle degrees. so the params x, y, z is only 3 basic params of a vector(line) in 3D space. As you can see, a vector has direction and length, but in this function, length is useless, so whether your vector length is 1 or 100 or 0.2, it doesn't make sense in this function. It's just a direction mark.

Why do we need perspective division?

I know perspective division is done by dividing x,y, and z by w, to get normalized device coordinates. But I am not able to understand the purpose of doing that. Also, does it have anything to do with clipping?
Some details that complement the general answers:
The idea is to project a point (x,y,z) on screen to have (xs,ys,d).
The next figure shows this for the y coordinate.
We know from school that
tan(alpha) = ys / d = y / z
This means that the projection is computed as
ys = d*y/z = y /w
w = z / d
This is enough to apply a projection.
However in OpenGL, you want (xs,ys,zs) to be normalized device coordinates in [-1,1] and yes this has something to do with clipping.
The extrema values for (xs,ys,zs) represent the unit cube and everything outside it will be clipped.
So a projection matrix usually takes into consideration the clipping limits (Frustum) to make a single transformation that, with the perspective division, simultaneously apply a projection and transform the projected coordinates along with the z to normalized device coordinates.
I mean why do we need that?
In layman terms: To make perspective distortion work. In a perspective projection matrix, the Z coordinate gets "mixed" into the W output component. So the smaller the value of the Z coordinate, i.e. the closer to the origin, the more things get scaled up, i.e. bigger on screen.
To really distill it to the basic concept, and why the op is division (instead of e.g. square root or some such), consider that an object twice as far should appear with dimensions exactly one half as large. Obtain 1/2 from 2 by... division.
There are many geometric ways to arrive at the same conclusion. A diagram serves as visual proof for this, really.
Dividing x, y, z by w is a "trick" you do with "homogeneous coordinates". To convert a R⁴ vector back to R³ by dividing by the 4th component (or w component as you said). A process called dehomogenizing.
Why you use homogeneous coordinate? That topic is a little bit more involved, I try to explain. I hope I do it justice.
However I will use the x1, x2, x3, x4 as the components of a vector instead of x, y, z, w:
Consider a 3x3 Matrix M and column vectors x, a, b, c of R³. x=(x1, x2, x3) and x1,x2,x3 being scalars or components of x.
With the 3x3 Matrix can do all linear transformations on a vector x you could do with the linear combination:
x' = x1*a + x2*b + x3*c (1).
(x' is the transformed vector that holds the result of transforming x).
Khan Academy on his Course Linear Algebra has a section explaining the fact that every linear transformation can be written as a matrix product with a vector.
You can try this out for example by putting the column vectors a, b, c in the columns of the Matrix M = [ a b c ].
So with the matrix product you essentially get the upper linear combination:
x' = M * x = [a b c] * x = a*x1 + b*x2 + c*x3 (2).
However this operation only accounts for rotation, scaling and shearing transformations. The origin (0, 0, 0) will always stay at (0, 0, 0).
For this you need another kind of transformation named "translation" (moving a vector or adding a vector to the vector).
Consider the translation column vector t = (t1, t2, t3) and the linear combination
x' = x1*a + x2*b + x3*c + t (3).
With this linear combination you can translate, rotate, scale and shear a vector. As you can see this Linear Combination does actually move the origin vector (0, 0, 0) to (0+t1, 0+t2, 0+t3).
However you can't put this translation into a 3x3 Matrix.
So what Graphics Programmers or Mathematicians came up with is adding another dimension to the Matrix and Vectors like this:
M is 4x4 Matrix, x~ vector in R⁴ with x~=(x1, x2, x3, x4). a, b, c, t also being column vectors of R⁴ (last components of a,b,c being 0 and last component for t being 1 - I keep the names the same to later show the similarity between homogeneous linear combination and (3) ). x~ is the homogeneous coordinate of x.
Now watch what happens if we take a vector x of R³ and put it into x~ of R⁴.
This vector will be in homogeneous coordinates in R⁴ x~=(x1, x2, x3, 1). The last component simply being 1 if it is a point and 0 if it's simply a direction (which couldn't be translated anyway).
So you have the linear combination:
x~' = M * x = [a b c t] * x = x1*a + x2*b + x3*c + x4*t (4).
(x~' is the result vector when transforming the homogeneous vector x~)
Since we took a vector from R³ and put it into R⁴ our x4 component is 1 we have:
x~' = x1*a + x2*b + x3*c + 1*t
<=> x~' = x1*a + x2*b + x3*c + t (5).
which is exactly the upper linear transformation (3) with the translation t. This is called an affine transformation (linear transf. + translation).
So with a 3x3 Matrix and a vector of R³ you couldn't do translations. However adding another dimension having a vector in R⁴ and a Matrix in R^4x4 you actually can do it.
However when you want to return to R³ you have to divide the first components with the last one. This is called "dehomogenizing". Which is the the x4 component or in your variable naming the w-component. So x is the original coordinate in R³. Be x~ in R⁴ and the homogenized vector of x. And x' in R³ of x~.
x' = (x1/x4, x2/x4, x3/x4) (6).
Then x' is the dehomogenized vector of the vector x~.
Coming back to perspective division:
(I will leave it out, because many here have explained the divide by z already. It's because of the relationship of a right triangle, being similar which leads you to simplify that with a given focal length f a z values with y coordinate lands at y' = f*y/z. Also since you stated [I hope I didn't misread that you already know why this is done I simply leave a link to a YT-Video here, I find it very well explained on the course lecture CMU 15-462/662 ).
When dehomogenizing the division by the w-component is a pretty handy property when returning to R³. When you apply homogeneous perspective Matrix of 4x4 on a vector you simply put the z component into the w component and let the dehomogenizing process (as in (6) ) perform the perspective divide. So you can setup the w-Component in a way that the division by w divides by z and also maps the values from 0 to 1 (basically you put the range of z-near to z-far values into a range floating points are precise at).
This is also described by Ravi Ramamoorthi in his Course CSE167 when he explains how to set up the perspective projection matrix.
I hope this helped to understand the rational of putting z into the w component. Sorry for my horrible formatting and lengthy text. Yet I hope it helped more than it confused.
Best of luck!
Actually, via standard notational convention from a 4x4 perspective matrix with sightline along a 'z' direction, 'w' differs by 1 from the distance ratio. Also that ratio, though interpreted correctly, is normally expressed as -z/d where 'z' is negative (therefore producing the correct ratio) because, again, in common notational convention, the camera is looking in the negative 'z' direction.
The reason for the offset by 1 needs to be explained. Many references put the origin at the image plane rather than the center of projection. With that convention (again with the camera looking along the negative 'z' direction) the distance labeled 'z' in the similar triangles diagram is thereby replaced by (d-z). Then substituting that for 'z' the expression for 'w' becomes, instead of 'z/d', (d-z)/d = [1-z/d]. To some these conventions may seem unorthodox but they are quite popular among analysts.

Angles of 3D vector - getting both

I have object A, with a speed. Speed is specified as 3D vector a = (x, y, z). Position is 3D point A [X, Y, Z]. I need to find out, if the current speed leads this object to another object B on position B [X, Y, Z].
I've sucessfully implemented this in 2 dimensions, ignoring the third one:
/*A is projectile, B is static object*/
//entity is object A
// - .v[3] is the speed vector
//position[3] is array of coordinates of object B
double vector[3]; //This is the vector c = A-B
this->entityVector(-1, entity.id, vector); //Fills the correct data
double distance = vector_size(vector); //This is distance |AB|
double speed = vector_size(entity.v); //This is size of speed vector a
float dist_angle = (float)atan2(vector[2],vector[0])*(180.0/M_PI); //Get angle of vector c as seen from Y axis - using X, Z
float speed_angle = (float)atan2((double)entity.v[2],entity.v[0])*(180.0/M_PI); //Get angle of vector a seen from Y axis - using X, Z
dist_angle = deg180to360(dist_angle); //Converts value to 0-360
speed_angle = deg180to360(speed_angle); //Converts value to 0-360
int diff = abs((int)compare_degrees(dist_angle, speed_angle)); //Returns the difference of vectors direction
I need to create the very same comparison to make it work in 3D - right now, the Y positions and Y vector coordinates are ignored.
What calculation should I do to get the second angle?
Edit based on answer:
I am using spherical coordinates and comparing their angles to check if two vectors are pointing in the same direction. With one vector being the A-B and another A's speed, I'me checking id A is heading to B.
I'm assuming the "second angle" you're looking for is φ. That is to say, you're using spherical coordinates:
(x,y,z) => (r,θ,φ)
r = sqrt(x^2 + y^2 + z^2)
θ = cos^-1(z/r)
φ = tan^-1(y/x)
However, if all you want to do is find if A is moving with velocity a towards B, you can use a dot product for a basic answer.
1st vector: B - A (vector pointing from A to B)
2nd vector: a (velocity)
dot product: a * (B-A)
If the dot product is 0, it means that you're not getting any closer - you're moving around a sphere of constant radius ||B-A|| with B at the center. If the dot product > 0, you're moving towards the point, and if the dot product < 0, you're moving away from it.

Perspective projection - how to convert coordinates

I'm studying perspective projections and I stumbled upon this concept:
Basically it says that if I have a point (x,y,z) I can project it into my perspective screen (camera space) by doing
x' = x/z
y' = y/z
z' = f(z-n) / z(f-n)
I can't understand why x' = x/z or y' = y/z
Geometrically, it is a matter of similar triangles.
In your diagram, because (x,y,x) is on the same dotted line as (x',y',z'):
triangle [(0,0,0), (0,0,z), (x,y,z)]
is similar to
triangle [(0,0,0), (0,0,z'), (x',y',z')]
This means that the corresponding sides have a fixed ratio. And, further, the original vector is proportional to the projected vector. Finally, note that the notional projection plane is at z' = 1:
(x,y,z) / z = (x',y',z') / z'
-> so, since z' = 1:
x'/z' = x' = x/z
y'/z' = y' = y/z
[Warning: note that the z' in my answer is different from its occurrence in the question. The question's z' = f(z-n) / z(f-n) doesn't correspond directly to a physical point: it is a "depth value", which is used to do things like hidden surface removal.]
One way to look at this, is that what you are trying to do, is intersect a line which passes through both the viewer position (assumed to be at the origin: 0,0,0), and the point in space you wish to project (P).
So you take the equation of the line, which is P' = P * a, where a is simply a scalar value and solve for P'.Z = 1 (which is where your projection plane is). This is trivially true when the scalar multiple is 1 / P.Z, so the projected point is (P.X, P.Y, P.Z) * (1 / P.Z)
Homogenous coordinates give us the power to represent a point/line at infinity.
we add 1 to the vector representation. The more the distance of a point in 3d space, It tends to move toward the optical centre.
cartesian to homogenous
p=(x,y)to(x,y,1)
homogenous to cartesian
(X, Y, Z)to(X/Z, Y/Z)
For instance,
1. you are travelling in an aeroplane and when you look down, it doesn't seem like points move faster from one instant to another. This is distance is very large, Distance =1/Disparity(drift of the same point in two frames).
2. Try with substituting Infinity in the disparity, it means distance is 0.