I render a 3D mesh model using OpenGL with perspective camera – gluPerspective(fov, aspect, near, far).
Then I use rendered image in a computer vision algorithm.
At some point that algorithm requires camera matrix K (along with several vertices on the model and their corresponding projections) in order to estimate camera position: rotation matrix R and translation vector t. I can estimate R and t by using any algorithm which solves Perspective-n-Point problem.
I construct K from the OpenGL projection matrix (see how here)
K = [fX, 0, pX | 0, fY, pY | 0, 0, 1]
If I want to project a model point 'by hand' I can compute:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] / X_proj[3]
y_pixel = X_proj[2] / X_proj[3]
Anyway, I pass this camera matrix in a PnP algorithm and it works just fine.
But then I had to change perspective projection to orthographic one.
As far as I understand when using orthographic projection the camera matrix becomes:
K = [1, 0, 0 | 0, 1, 0 | 0, 0, 0]
So I changed gluPerspective to glOrtho. Following the same way I constructed K from OpenGL projection matrix, and it turned out that fX and fY are not ones but 0.0037371. Is this a scaled orthographic projection or what?
Moreover, in order to project model vertices 'by hand' I managed to do the following:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] + width / 2
y_pixel = X_proj[2] + height / 2
Which is not what I expected (that plus width and hight divided by 2 seems strange...). I tried to pass this camera matrix to POSIT algorithm to estimate R and t, and it doesn't converge. :(
So here are my questions:
How to get orthographic camera matrix from OpenGL?
If the way I did it is correct then is it true orthographic? Why POSIT doesn't work?
Orthographic projection will not use the depth to scale down farther points. Though, it will scale the points to fit inside the NDC which means it will scale the values to fit inside the range [-1,1].
This matrix from Wikipedia shows what this means:
So, it is correct to have numbers other than 1.
For your way of computing by hand, I believe it's not scaling back to screen coordinates and that makes it wrong. As I said, the output of projection matrices will be in the range [-1,1], and if you want to get the pixel coordinates, I believe you should do something similar to this:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1]*width/2 + width / 2
y_pixel = X_proj[2]*height/2 + height / 2
Anyway, I think you'd be better if you used modern OpenGL with libraries like GLM. In this case, you have the exact projection matrices used at hand.
Related
I am a graphics programming beginner working on my own engine and tried to implement frustum-aligned volume rendering.
The idea was to render multiple planes as vertical slices across the view frustum and then use the world coordinates of those planes for procedural volumes.
Rendering the slices as a 3d model and using the vertex positions as worldspace coordinates works perfectly fine:
//Vertex Shader
gl_Position = P*V*vec4(vertexPosition_worldspace,1);
coordinates_worldspace = vertexPosition_worldspace;
Result:
However rendering the slices in frustum-space and trying to reverse engineer the world space coordinates doesent give expected results. The closest i got was this:
//Vertex Shader
gl_Position = vec4(vertexPosition_worldspace,1);
coordinates_worldspace = (inverse(V) * inverse(P) * vec4(vertexPosition_worldspace,1)).xyz;
Result:
My guess is, that the standard projection matrix somehow gets rid of some crucial depth information, but other than that i have no clue what i am doing wrong and how to fix it.
Well, it is not 100% clear what you mean by "frustum space". I'm going to assume that it does refer to normalized device coordinates in OpenGL, where the view frustum is (by default) the axis-aligned cube -1 <= x,y,z <= 1. I'm also going to assume a perspective projection, so that NDC z coordinate is actually a hyperbolic function of eye space z.
My guess is, that the standard projection matrix somehow gets rid of some crucial depth information, but other than that i have no clue what i am doing wrong and how to fix it.
No, a standard perspective matrix in OpenGL looks like
( sx 0 tx 0 )
( 0 sy ty 0 )
( 0 0 A B )
( 0 0 -1 0 )
When you multiply this by a (x,y,z,1) eye space vector, you get the homogenous clip coordinates. Consider only the
last two lines of the matrix as separate equations:
z_clip = A * z_eye + B
w_clip = -z_eye
Since we do the perspective divide by w_clip to get from clip space to NDC, we end up with
z_ndc = - A - B/z_eye
which is actually the hyperbolically remapped depth information - so that information is completely preserved. (Also note that we do the division also for x and y).
When you calculate inverse(P), you only invert the 4D -> 4D homogenous mapping. But you will get a resulting w that is not 1 again, so here:
coordinates_worldspace = (inverse(V) * inverse(P) * vec4(vertexPosition_worldspace,1)).xyz;
^^^
lies your information loss. You just skip the resulting w and use the xyz components as if it were cartesian 3D coordinates, but they are 4D homogenous coordinates representing some 3D point.
The correct approach would be to divide by w:
vec4 coordinates_worldspace = (inverse(V) * inverse(P) * vec4(vertexPosition_worldspace,1));
coordinates_worldspace /= coordinates_worldspace.w
I have a completely implemented, working engine in OpenGL that supports a projection camera with raycasting. Recently, I implemented an orthogonal camera type, and visually, it's working just fine. For reference, here's how I compute the orthographic matrix:
double l = -viewportSize.x / 2 * zoom_;
double r = -l;
double t = -viewportSize.y / 2 * zoom_;
double b = -t;
double n = getNear();
double f = getFar();
m = Matrix4x4(
2 / (r - l),
0,
0,
-(r + l) / (r - l),
0,
2 / (t - b),
0,
-(t + b) / (t - b),
0,
0,
-2 / (f - n),
-(f + n) / (f - n),
0,
0,
0,
1);
However, my issue now is that raycasting does not work with the orthogonal camera. The issue seems to be that the raycasting engine was coded with projection-type cameras in mind, therefore when using the orthographic matrix instead it stops functioning. For reference, here's a high-level description of how the raycasting is implemented:
Get the world-space origin vector
Get normalized screen coordinate from input screen coordinates
Build mouseVector = (normScreenCoords.x, normScreenCoords.y, 0 if "near" or 1 if "far"
Build view-projection matrix (get view and projection matrices from Camera and multiply them)
Multiply the mouseVector by the inverse of the view-projection matrix.
Get the world-space forward vector
Get mouse world coordinates (far) and subtract them from mouse world coordinates (near)
Send the world-space origin and world-space forward vectors to the raycasting engine, which handles the logic of comparing these vectors to all the visible objects in the scene efficiently by using bounding boxes.
How do I modify this algorithm to work with orthographic cameras?
Your steps are fine and should work as expected with an orthographic camera. There may be a problem with the way you are calculating the origin and direction.
1.) Get the origin vector. First calculate the mouse position in world-space units, ie float rayX = (mouseX - halfResolution) / viewport.width * (r - l) or similar. It should be offset so the center of the screen is (0, 0), and the extreme values the mouse can reach translate to the edges of the viewport l, r, t, b. Then start with the camera position in world space and add two vectors rayX * camera.local.right and rayY * camera.local.up, where right and up are unit vectors in the camera's local co-ordinate system.
2.) The world space forward vector is always the camera forward vector for any mouse position.
3.) This should work without modification as long as you have the correct vectors for 1 and 2.
I have a vertex (x, y, z) and I want to calculate the screen location where this point would be rendered on my viewport. Something like Ray Picking, just more or less the other way around. I don't think I can use gluProject because at the time I need the projected point my matrices are restored to identities.
I would like to stay independent from OpenGL, so no extra render pass. This way I'm sure it would only be some math like the ray picking thing. I've implemented that one and it works well, so I want to project a vertex the same way.
Of course I have camera pos, up and lookAt vectors and fovy. Is there any source of information about this? Or does anyone know how to work this out?
If your know your matrices (or at least know how to construct them), you can compute screen location for a vertex by multiplying its position with the matrices and then performing viewport transformation:
vProjected = modelViewPojectionMatrix * v;
if (
// check that vertex shouldn't be clipped.
-vProjected.w <= vProjected.x && vProjected.x <= vProjected.w &&
-vProjected.w <= vProjected.y && vProjected.y <= vProjected.w &&
-vProjected.w <= vProjected.z && vProjected.z <= vProjected.w
) {
vProjected /= vProjected.w;
vScreen.x = VIEWPORT_W * vProjected.x / 2 + VIEWPORT_CENTER_X;
vScreen.y = VIEWPORT_H * vProjected.y / 2 + VIEWPORT_CENTER_Y;
}
Note that, as per OpenGL convention, (0, 0) is lower left corner, not upper left one.
Any math library with verctor and matrix operations can help you with that. For example, mathfu or glm.
UPD. How you can construct modelViewProjectionMatrix given camera position and orientation and projection params? We need two matrices (let's assume that model matrix is just an identity, i.e. vertex positions a given already in world coordinate system). First one would be the view matrix, which takes into account camera position and orientation. Here I'll be using mathfu since I'm more familiar with it, but almost every math library design with 3D graphics in mind has the same functions:
viewMatrix = mathfu::mat4::LookAt(
cameraLookAtPosition,
cameraPosition,
cameraUpVector
);
The second one would be projection matrix:
projectionMatrix = mathfu::mat4::Perspective(fovy, aspect, zNear, zFar);
Now modelViewProjectionMatrix is just a product of those two:
modelViewProjectionMatrix = projectionMatrix * viewMatrix;
Note that matrix multiplication is not commutative, in other words A * B != B * A. So order in which matrices are multiplied is important.
I am writing software to determine the viewable locations of a camera in 3D. I have currently implement parts to find the minimum and maximum length of view based on the camera and lenses intrinsic characteristics.
I now need to work out that if the camera is placed at X,Y,Z and is pointing in a direction (two angles, one around the horizontal and one around the vertical axis) what the boundaries the camera can see at are (knowing the viewing angle). The output I would like is 4 3D locations, making a rectangle that show the minimum position, top left, top right, bottom left and bottom right. The same is also required for the maximum positions.
Can anyone help with the geometry to find these points?
Some code I have:
QVector3D CameraPerspective::GetUnitVectorOfCameraAngle()
{
QVector3D inital(0, 1, 0);
QMatrix4x4 rotation_matrix;
// rotate around z axis
rotation_matrix.rotate(_angle_around_z, 0, 0, 1);
//rotate around y axis
rotation_matrix.rotate(_angle_around_x, 1, 0, 0);
inital = inital * rotation_matrix;
return inital;
}
Coordinate CameraPerspective::GetFurthestPointInFront()
{
QVector3D camera_angle_vector = GetUnitVectorOfCameraAngle();
camera_angle_vector.normalize();
QVector3D furthest_point_infront = camera_angle_vector * _camera_information._maximum_distance_mm;
return Coordinate(furthest_point_infront + _position_of_this);
}
Thanks
A complete answer with code will be probably way too long for SO, I hope that this will be enough. In the following we work in homogeneous coordinates.
I have currently implement parts to find the minimum and maximum length of view based on the camera and lenses intrinsic characteristics.
That isn't enough to fully define your camera. You also need a field of view angle and the width/height ratio.
With all these information (near plane + far plane + fov + ratio), you can build a 4x4 matrix known as perspective matrix. Google for it or check here for some references. This matrix maps the pyramidal region of the space which your camera "sees" (usually simply called frustrum) to the [-1,1]x[-1,1]x[-1,1] cube. Call it P.
Now you need a 4x4 camera matrix which transform points in world space to points in camera space. Since you know the camera position and the camera orientation this can be constructed easily (there is no room here to full explain how transformation matrices in homogeneous coordinates work, google for it). Call this matrix C.
Now consider the matrix A = P * C.
This matrix transforms points in world coordinates to points in the perspective space. Your camera will "see" those points if they are inside the [-1,1]x[-1,1]x[-1,1] cube. But you can invert this matrix in order to map points inside the cube to points in world space. So in order to obtain the 8 points you need in world space you can simply do:
y = A^(-1) * x
Where x =
[-1,-1,-1, 1] left - bottom - near
[-1,-1, 1, 1] left - bottom - far
etc.
I'm working on a simple OpenGL world- and so far I've got a bunch of cubes randomly placed about and it's pretty fun to go zooming about. However I'm ready to move on. I would like to drop blocks in front of my camera, but I'm having trouble with the 3d angles. I'm used to 2d stuff where to find an end point we simply do something along the lines of:
endy = y + (sin(theta)*power);
endx = x + (cos(theta)*power);
However when I add the third dimension I'm not sure what to do! It seems to me that the power of the second dimensional plane would be determined by the z axis's cos(theta)*power, but I'm not positive. If that is correct, it seems to me I'd do something like this:
endz = z + (sin(xtheta)*power);
power2 = cos(xtheta) * power;
endx = x + (cos(ytheta) * power2);
endy = y + (sin(ytheta) * power2);
(where x theta is the up/down theta and y = left/right theta)
Am I even close to the right track here? How do I find an end point given a current point and an two angles?
Working with euler angles doesn't work so well in 3D environments, there are several issues and corner cases in which they simply don't work. And you actually don't even have to use them.
What you should do, is exploit the fact, that transformation matrixes are nothing else, then coordinate system bases written down in a comprehensible form. So you have your modelview matrix MV. This consists of a model space transformation, followed by a view transformation (column major matrices multiply right to left):
MV = V * M
So what we want to know is, in which way the "camera" lies within the world. That is given to you by the inverse view matrix V^-1. You can of course invert the view matrix using Gauss Jordan method, but most of the time your view matrix will consist of a 3×3 rotation matrix with a translation vector column P added.
R P
0 1
Recall that
(M * N)^-1 = N^-1 * M^-1
and also
(M * N)^T = M^T * N^T
so it seems there is some kind of relationship between transposition and inversion. Not all transposed matrices are their inverse, but there are some, where the transpose of a matrix is its inverse. Namely it are the so called orthonormal matrices. Rotations are orthonormal. So
R^-1 = R^T
neat! This allows us to find the inverse of the view matrix by the following (I suggest you try to proof it as an exersice):
V = / R P \
\ 0 1 /
V^-1 = / R^T -P \
\ 0 1 /
So how does this help us to place a new object in the scene at a distance from the camera? Well, V is the transformation from world space into camera space, so V^-1 transforms from camera to world space. So given a point in camera space you can transform it back to world space. Say you wanted to place something at the center of the view in distance d. In camera space that would be the point (0, 0, -d, 1). Multiply that with V^-1:
V^-1 * (0, 0, -d, 1) = (R^T)_z * d - P
Which is exactly what you want. In your OpenGL program you somewhere have your view matrix V, probably not properly named yet, but anyway it is there. Say you use old OpenGL-1 and GLU's gluLookAt:
void display(void)
{
/* setup viewport, clear, set projection, etc. */
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt(...);
/* the modelview matrix now holds the View transform */
At this point we can extract the modelview matrix
GLfloat view[16];
glGetFloatv(GL_MODELVIEW_MATRIX, view);
Now view is in column major order. If we were to use it directly we could directly address the columns. But remember that transpose is inverse of a rotation, so we actually want the 3rd row vector. So let's assume you keep view around, so that in your event handler (outside display) you can do the following:
GLfloat z_row[3];
z_row[0] = view[2];
z_row[1] = view[6];
z_row[2] = view[10];
And we want the position
GLfloat * const p_column = &view[12];
Now we can calculate the new objects position at distance d:
GLfloat new_object_pos[3] = {
z_row[0]*d - p_column[0],
z_row[1]*d - p_column[1],
z_row[2]*d - p_column[2],
};
There you are. As you can see, nowhere you had to work with angles or trigonometry, it's just straight linear algebra.
Well I was close, after some testing, I found the correct formula for my implementation, it looks like this:
endy = cam.get_pos().y - (sin(toRad(180-cam.get_rot().x))*power1);
power2 = cos(toRad(180-cam.get_rot().x))*power1;
endx = cam.get_pos().x - (sin(toRad(180-cam.get_rot().y))*power2);
endz = cam.get_pos().z - (cos(toRad(180-cam.get_rot().y))*power2);
This takes my camera's position and rotational angles and get's the corresponding points. Works like a charm =]