I'm working on a computer vision problem which requires rendering a 3d model using a calibrated camera. I'm writing a function that breaks the calibrated camera matrix into a modelview matrix and a projection matrix, but I've run into an interesting phenomenon in opengl that defies explanation (at least by me).
The short description is that negating the projection matrix results in nothing being rendered (at least in my experience). I would expect that multiplying the projection matrix by any scalar would have no effect, because it transforms homogeneous coordinates, which are unaffected by scaling.
Below is my reasoning why I find this to be unexpected; maybe someone can point out where my reasoning is flawed.
Imagine the following perspective projection matrix, which gives correct results:
[ a b c 0 ]
P = [ 0 d e 0 ]
[ 0 0 f g ]
[ 0 0 h 0 ]
Multiplying this by camera coordinates gives homogeneous clip coordinates:
[x_c] [ a b c 0 ] [X_e]
[y_c] = [ 0 d e 0 ] * [Y_e]
[z_c] [ 0 0 f g ] [Z_e]
[w_c] [ 0 0 h 0 ] [W_e]
Finally, to get normalized device coordinates, we divide x_c, y_c, and z_c by w_c:
[x_n] [x_c/w_c]
[y_n] = [y_c/w_c]
[z_n] [z_c/w_c]
Now, if we negate P, the resulting clip coordinates should be negated, but since they are homogeneous coordinates, multiplying by any scalar (e.g. -1) shouldn't have any affect on the resulting normalized device coordinates. However, in openGl, negating P results in nothing being rendered. I can multiply P by any non-negative scalar and get the exact same rendered results, but as soon as I multiply by a negative scalar, nothing renders. What is going on here??
Thanks!
Well, the gist of it is that clipping testing is done through:
-w_c < x_c < w_c
-w_c < y_c < w_c
-w_c < z_c < w_c
Multiplying by a negative value breaks this test.
I just found this tidbit, which makes progress toward an answer:
From Red book, appendix G:
Avoid using negative w vertex coordinates and negative q texture coordinates. OpenGL might not clip such coordinates correctly and might make interpolation errors when shading primitives defined by such coordinates.
Inverting the projection matrix will result in negative W clipping coordinate, and apparently opengl doesn't like this. But can anyone explain WHY opengl doesn't handle this case?
reference: http://glprogramming.com/red/appendixg.html
Reasons I can think of:
By inverting the projection matrix, the coordinates will no longer be within your zNear and zFar planes of the view frustum (necessarily greater than 0).
To create window coordinates, the normalized device coordinates are translated/scaled by the viewport. So, if you've used a negative scalar for the clip coordinates, the normalized device coordinates (now inverted) translate the viewport to window coordinates that are... off of your window (to the left and below, if you will)
Also, since you mentioned using a camera matrix and that you have inverted the projection matrix, I have to ask... to which matrices are you applying what from the camera matrix? Operating on the projection matrix save near/far/fovy/aspect causes all sorts of problems in the depth buffer including anything that uses z (depth testing, face culling, etc).
The OpenGL FAQ section on transformations has some more details.
Related
I am a graphics programming beginner working on my own engine and tried to implement frustum-aligned volume rendering.
The idea was to render multiple planes as vertical slices across the view frustum and then use the world coordinates of those planes for procedural volumes.
Rendering the slices as a 3d model and using the vertex positions as worldspace coordinates works perfectly fine:
//Vertex Shader
gl_Position = P*V*vec4(vertexPosition_worldspace,1);
coordinates_worldspace = vertexPosition_worldspace;
Result:
However rendering the slices in frustum-space and trying to reverse engineer the world space coordinates doesent give expected results. The closest i got was this:
//Vertex Shader
gl_Position = vec4(vertexPosition_worldspace,1);
coordinates_worldspace = (inverse(V) * inverse(P) * vec4(vertexPosition_worldspace,1)).xyz;
Result:
My guess is, that the standard projection matrix somehow gets rid of some crucial depth information, but other than that i have no clue what i am doing wrong and how to fix it.
Well, it is not 100% clear what you mean by "frustum space". I'm going to assume that it does refer to normalized device coordinates in OpenGL, where the view frustum is (by default) the axis-aligned cube -1 <= x,y,z <= 1. I'm also going to assume a perspective projection, so that NDC z coordinate is actually a hyperbolic function of eye space z.
My guess is, that the standard projection matrix somehow gets rid of some crucial depth information, but other than that i have no clue what i am doing wrong and how to fix it.
No, a standard perspective matrix in OpenGL looks like
( sx 0 tx 0 )
( 0 sy ty 0 )
( 0 0 A B )
( 0 0 -1 0 )
When you multiply this by a (x,y,z,1) eye space vector, you get the homogenous clip coordinates. Consider only the
last two lines of the matrix as separate equations:
z_clip = A * z_eye + B
w_clip = -z_eye
Since we do the perspective divide by w_clip to get from clip space to NDC, we end up with
z_ndc = - A - B/z_eye
which is actually the hyperbolically remapped depth information - so that information is completely preserved. (Also note that we do the division also for x and y).
When you calculate inverse(P), you only invert the 4D -> 4D homogenous mapping. But you will get a resulting w that is not 1 again, so here:
coordinates_worldspace = (inverse(V) * inverse(P) * vec4(vertexPosition_worldspace,1)).xyz;
^^^
lies your information loss. You just skip the resulting w and use the xyz components as if it were cartesian 3D coordinates, but they are 4D homogenous coordinates representing some 3D point.
The correct approach would be to divide by w:
vec4 coordinates_worldspace = (inverse(V) * inverse(P) * vec4(vertexPosition_worldspace,1));
coordinates_worldspace /= coordinates_worldspace.w
I render a 3D mesh model using OpenGL with perspective camera – gluPerspective(fov, aspect, near, far).
Then I use rendered image in a computer vision algorithm.
At some point that algorithm requires camera matrix K (along with several vertices on the model and their corresponding projections) in order to estimate camera position: rotation matrix R and translation vector t. I can estimate R and t by using any algorithm which solves Perspective-n-Point problem.
I construct K from the OpenGL projection matrix (see how here)
K = [fX, 0, pX | 0, fY, pY | 0, 0, 1]
If I want to project a model point 'by hand' I can compute:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] / X_proj[3]
y_pixel = X_proj[2] / X_proj[3]
Anyway, I pass this camera matrix in a PnP algorithm and it works just fine.
But then I had to change perspective projection to orthographic one.
As far as I understand when using orthographic projection the camera matrix becomes:
K = [1, 0, 0 | 0, 1, 0 | 0, 0, 0]
So I changed gluPerspective to glOrtho. Following the same way I constructed K from OpenGL projection matrix, and it turned out that fX and fY are not ones but 0.0037371. Is this a scaled orthographic projection or what?
Moreover, in order to project model vertices 'by hand' I managed to do the following:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] + width / 2
y_pixel = X_proj[2] + height / 2
Which is not what I expected (that plus width and hight divided by 2 seems strange...). I tried to pass this camera matrix to POSIT algorithm to estimate R and t, and it doesn't converge. :(
So here are my questions:
How to get orthographic camera matrix from OpenGL?
If the way I did it is correct then is it true orthographic? Why POSIT doesn't work?
Orthographic projection will not use the depth to scale down farther points. Though, it will scale the points to fit inside the NDC which means it will scale the values to fit inside the range [-1,1].
This matrix from Wikipedia shows what this means:
So, it is correct to have numbers other than 1.
For your way of computing by hand, I believe it's not scaling back to screen coordinates and that makes it wrong. As I said, the output of projection matrices will be in the range [-1,1], and if you want to get the pixel coordinates, I believe you should do something similar to this:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1]*width/2 + width / 2
y_pixel = X_proj[2]*height/2 + height / 2
Anyway, I think you'd be better if you used modern OpenGL with libraries like GLM. In this case, you have the exact projection matrices used at hand.
After searching up the calculations for a Projection Matrix (atleast in OpenGL),
Why bother using a Matrix when we have so many empty values? I count 9 entries marked as 0, and only 7 containing useful data. Why not just use a similar 1D array, and just store the data in a list-like shape? Wouldn't this save memory and time creating functions which can manipulate matrices? I'm sure that this entire argument can be used in other topics as-well, which makes me think,
What is the specific reason for using Matrices in projecting 3D environments?
The projection of a 3D point (x,y,z) to the 2D image coordinates (X,Y) can be calculated as a vector-matrix multiplication in homogeneous coordinates:
[ a_00 a_01 a_02 a_03 ] [ x ] [ X W ]
[ a_10 a_11 a_12 a_13 ] * [ y ] = [ Y W ]
[ a_20 a_21 a_22 a_23 ] [ z ] [ Z W ]
[ a_30 a_31 a_32 a_33 ] [ 1 ] [ W ]
with
[ X W ] [ x a_00 + y a_01 + z a_02 + a_03 ]
[ Y W ] [ x a_10 + y a_11 + z a_12 + a_13 ]
[ Z W ] = [ x a_20 + y a_21 + z a_22 + a_23 ]
[ W ] [ x a_30 + y a_31 + z a_32 + a_33 ]
And the pixel coordinates (X,Y) are obtained by dividing the first and second rows by the fourth row. This step is the conversion from homogeneous to cartesian coordinates.
The third row of the OpenGL projection matrix is set up in a way that Z becomes the projected depth, which is such that z values between n and f (near and far planes) are mapped to -1...1. It is the used for depth test/clipping. Because the fourth row is [0 0 -1 0], the conversion from homogeneous to cartesian coordinates corresponds to a division by -z, which results in the perspective transformation (with inverted depth).
Any other way of expressing the projection would involve the same steps, namely the linear transformation, followed by the division by Z for the perspective foreshortening. Matrices are the usual representation in linear algebra to for these operations.
This is not specific for perspective projections, but many 3D transformatios can be expressed using a 4x4 matrix, including rotations, translations, scalings, shearings, reflections, perspective projection, orthogonal projection, and others.
Multiple transformations that should be applied after one another can also be combined into a single 4x4 matrix by matrix multiplication. For example rotations around the X, Y and Z axis, or the MVP matrix. This is the model-view-projection matrix, which translates a 3D point in the local coordinate system of one object in the 3D scene, into its final pixel coordinate on the screen. On these combined matrices all components can be non-zero.
So the advantage is that a single operation, the vector-matrix multiplication is useable for all these cases, instead of several different operations. It is performed in an efficient way on GPU hardware.
It's not just about the single values, it's also about the mathematical properties of a matrix. And the zeros are just as important as the nonzero values! The very layout of the values has meaning!
Specifically the first three columns of a homogenous transformation matrix (like a 3D projection matrix) form the base vectors of local coordinate space, the 4th column defines a translation (which in case of a perspective projection moves the base away from the singularity point at the origin).
So in 3D space you have 3 values per position: You have to translate these three values into 3 values on your screen (the third value translates to a value that's used for depth comparison) and the 4th value (of the position and the destination) is used for perspective distortion. So for each of the 4 values in the original position you must know, how much it contributes to each of the 4 values in the output. If it doesn't contribute (and that's just as important) this is 0. So you need 4 · 4 = 16 values in total. Hence a 4×4 matrix.
It's probably quite rare that the projection matrix would get used as-is. Typically, you're more likely to concatenate the projection matrix with the world and view matrices and multiply by the world-view-proj matrix all in one go.
Also, GPUs are powerful and flexible, but if there's one thing they're best at doing, it's a series of multiply-adds on vectors (although newer hardware is just as efficient with scalar multiply-adds as vector multiply-adds). Matrix-vector multiplies are just a series of vector multiply-adds, and a more compact structure might be less efficient.
That said, your point is not without merit, I am aware of one successful fixed-function based games console which had limited hardware registers for the projection matrix to take advantage of your exact point that most of the entries in the projection matrix are typically unused.
I'm trying to implement textures for spheres in my ray tracer. I managed to get something working, but I am unsure about its correctness. Below is the code for getting the texture coordinates. For now, the texture is random and is generated at runtime.
virtual void GetTextureCoord(Vect hitPoint, int hres, int vres, int& x, int& y) {
float theta = acos(hitPoint.getVectY());
float phi = atan2(hitPoint.getVectX(), hitPoint.getVectZ());
if (phi < 0.0) {
phi += TWO_PI;
}
float u = phi * INV_TWO_PI;
float v = 1 - theta * INV_PI;
y = (int) ((hres - 1) * u);
x = (int) ((vres - 1) * v);
}
This is how the spheres look now:
I had to normalize the coordinates of the hit point to get the spheres to look like that. Otherwise they would look like:
Was normalising the hit point coordinates the right approach, or is something else broken in my code? Thank you!
Instead of normalising the hit point, I tried translating it to the world origin (as if the sphere center was there) and obtained the following result:
I'm using a 256x256 resolution texture by the way.
It's unclear what you mean by "normalizing" the hit point since there's nothing that normalizes it in the code you posted, but you mentioned that your hit point is in world space.
Also, you didn't say what texture mapping you're trying to implement, but I assume you want your U and V texture coordinates to represent latitude and longitude on the sphere's surface.
Your first problem is that converting Cartesian to spherical coordinates requires that the sphere is centered at the origin in the Cartesian space, which isn't true in world space. If the hit point is in world space, you have to subtract the sphere's world-space center point to get the effective hit point in local coordinates. (You figured this part out already and updated the question with a new image.)
Your second problem is that the way you're calculating theta requires that the the sphere have a radius of 1, which isn't true even after you move the sphere's center to the origin. Remember your trigonometry: the argument to acos is the ratio of a triangle's side to its hypotenuse, and is always in the range (-1, +1). In this case your Y-coordinate is the side, and the sphere's radius is the hypotenuse. So you have to divide by the sphere's radius when calling acos. It's also a good idea to clamp the value to the (-1, +1) range in case floating-point rounding error puts it slightly outside.
(In principle you'd also have to divide the X and Z coordinates by the radius, but you're only using those for an inverse tangent, and dividing them both by the radius won't change their quotient and thus won't change phi.)
Right now your sphere intersection and texture-coordinate functions are operating in world space, but you'll probably find it useful later to implement transformation matrices, which let you transform things from one coordinate space to another. Then you can change your sphere functions to operate in a local coordinate space where the center is the origin and the radius is 1, and give each object an associated transformation matrix that maps the local coordinate space to the world coordinate space. This will simplify your ray/sphere intersection code, and let you remove the origin subtraction and radius division from GetTextureCoord (since they're always (0, 0, 0) and 1 respectively).
To intersect a ray with an object, you'd use the object's transformation matrix to transform the ray into the object's local coordinate space, do the intersection (and compute texture coordinates) there, and then transform the result (e.g. hit point and surface normal) back to world space.
I'm following some tutorials to learn openGL (from www.opengl-tutorial.org if it makes any difference) and there is an exercise that asks me to draw a cube and a triangle on the screen and it says as a hint that I'm supposed to calculate two MVP-matrices, one for each object. MVP matrix is given by Projection*View*Model and as far as I understand, the projection and view matrices are the same for all the objects on the screen (they are only affected by my choice of "camera" location and settings). However, the model matrix should change since it's supposed to give me the coordinates and rotation of the object in the global coordinates. Following the tutorials, for my cube the model matrix is just the unit matrix since it is located at the origin and there's no rotation or scaling. Then I draw my triangle so that its vertices are at (2,2,0), (2,3,0) and (3,2,0). Now my question is, what is the model matrix for my triangle?
My own reasoning says that if I don't want to rotate or scale it, the model matrix should be just translation matrix. But what gives the translation coordinates here? Should it include the location of one of the vertices or the center of the triangle or what? Or have I completely misunderstood what the model matrix is?
The model matrix is like the other matrices (projection, view) a 4x4 matrix with the same layout. Depending on whether you're using column or row vectors the matrix consists of the x,y,z axis of your local frame and a t1,t2,t3 vector specifying the translation part
so for a column vector p the transformation matrix (M) looks like
x1, x2, x3, t1,
y1, y2, y3, t2,
z1, z2, z3, t3,
0, 0, 0, 1
p' = M * p
so for row vectors you could try to find out how the matrix layout must be. Also note that if you have row vectors p' = p * M.
If you have no rotational component your local frame has the usual x,y,z axis as the rows of the 3x3 submatrix of the model matrix..
1 0 0 t1 -> x axis
0 1 0 t2 -> y axis
0 0 1 t3 -> z axis
0 0 0 1
the forth column specifies the translation vector (t1,t2,t3). If you have a point p =
1,
0,
0,
1
in a local coordinate system and you want it to translate +1 in z direction to place it in the world coordinate system the model matrix is simply:
1 0 0 0
0 1 0 0
0 0 1 1
0 0 0 1
p' = M * p .. p' is the transformed point in world coordinates.
For your example above you could already specify the triangle in (2,2,0), (2,3,0) and (3,2,0) in your local coordinate system. Then the model matrix is trivial. Otherwise you have to find out how you compute rotation etc.. I recommend reading the first few chapters of mathematics for 3d game programming and computer graphics. It's a very simple 3d math book, there you should get the minimal information you need to handle the most of the 3d graphics math.