So I was reading this tutorial's "Inverting the Camera Orientation Matrix" section and I don't understand why, when calculating the camera's up direction, I need to multiply the inverse of orientation by the up direction vector, and not just orientation.
I drew the following image to illustrate my insight of the tutorial I read.
What did I get wrong?
Well, that tutorial explicitely states:
The way we calculate the up direction of the camera is by taking the
"directly upwards" unit vector (0,1,0) and "unrotate" it by using the
inverse of the camera's orientation matrix. Or, to explain it
differently, the up direction is always (0,1,0) after the camera
rotation has been applied, so we multiply (0,1,0) by the inverse
rotation, which gives us the up direction before the camera rotation
was applied.
The up direction which is calculated here is the up direction in world space. In eye space, the up vector is (0,1,0) (by convention, one could define it differently). As the view matrix will transform coordinates from world space to eye space, we need to use the inverse to transform that up vector from eye space to the world space. Your image is wrong as it does not correctly relate to eye and world space.
Related
Assuming I have a camera mounted in a rail, I can move it back and forth to take photos of my scene.
Can I assume I have a Rotation Matrix equal to zero?
It depends on the coordinate system you choose. Assuming it's aligned with your camera rotation (e.g. negative Z-axis into viewing direction of the camera and positive y-axis points upwards) and you only move your camera without rotating it, then the rotation matrix, which is used to transform between these coordinate system is the Identity matrix.
A zero matrix makes no sense.
If you assume no rotation, then the rotation matrix is a 3x3 identity matrix, not zero.
Also, this may or may not be a good assumption depending on how accurate you want to be. Even if the camera is moving on a rail, there will be some small rotation.
i've been drawing directly into homogenous clip space (the 2x2x2 cube centered around 0,0,0) in opengl and i've realized that the perspective transformation matrix transforms all geometry from one right-parallelipid (view-space) to another right-parallelipid (homogenous clip-space).
so, why the heck does every opengl article use a non-right-parallelipid frustum to illustrate how projection works? i understand that the perspective transformation matrix will cause everything to get scaled by a term containing its distance from the camera and the camera's distance from the plane... is the traditional frustum illustration trying to explain that? or are we truly entered some warped space at some point in the perspective transformation? if so, how are we ending up back at a right-parallelipid (homogenous clip-space) at the end of it all?
You are right in the sense that there is just some affine, linear transformation and no real perspective distortion - when you just interpret the clip space as a 4-dimensional vector space.
But the clip space is not the "end of it all". The perspective effect is a nonlinear transformation which is finally achieved by the perspective division which will be done after the transformation to clip space. The projection matrix determines the w value that will be the divisor for this, and which is typically just -z_eye.
I am trying to create a dataset of images of objects at different poses, where each image is annotated with camera pose (or object pose).
If, for example, I have a world coordinate system and I place the object of interest at the origin and place the camera at a known position (x,y,z) and make it face the origin. Given this information, how can I calculate the pose (rotation matrix) for the camera or for the object.
I had one idea, which was to have a reference coordinate i.e. (0,0,z') where I can define the rotation of the object. i.e. its tilt, pitch and yaw. Then I can calculate the rotation from (0,0,z') and (x,y,z) to give me a rotation matrix. The problem is, how to now combine the two rotation matrices?
BTW, I know the world position of the camera as I am rendering these with OpenGL from a CAD model as opposed to physically moving a camera around.
The homography matrix maps between homogeneous screen coordinates (i,j) to homogeneous world coordinates (x,y,z).
homogeneous coordinates are normal coordinates with a 1 appended. So (3,4) in screen coordinates is (3,4,1) as homogeneous screen coordinates.
If you have a set of homogeneous screen coordinates, S and their associated homogeneous world locations, W. The 4x4 homography matrix satisfies
S * H = transpose(W)
So it boils down to finding several features in world coordinates you can also identify the i,j position in screen coordinates, then doing a "best fit" homography matrix (openCV has a function findHomography)
Whilst knowing the camera's xyz provides helpful info, its not enough to fully constrain the equation and you will have to generate more screen-world pairs anyway. Thus I don't think its worth your time integrating the cameras position into the mix.
I have done a similar experiment here: http://edinburghhacklab.com/2012/05/optical-localization-to-0-1mm-no-problemo/
I am trying to implement a functionality in OpenGL using GLUI such that the "Arcball" control is used to input the position of the light. I'm not sure how to go about this as the rotation matrix given by the arcball is of 4x4 dimensions, while the light position is an 1-D array of 3 coordinates.
Rotationg a light around the scene makes only sense for directional lights (i.e. positions at infinity). So you're not rotating a point, but a direction, just like a normal. Easy enough: Let the unrotated light have the direction (0,0,1,0). Then to rotate this around the scene you multiply it by the transpose-inverse of the given matrix. But since you know, that this matrix does contain only a rotation, this is a special case where the transpose-inverse is the same as the original matrix.
So you just multiply your initial light direction (0,0,1,0) with the matrix.
We can simplify this even further. If you look at the multiplication, you see, that it essentially just extracts the (weighted) column(s) of the matrix for which the original light position vector is nonzero. So, if we really start with a light direction of (0,0,1,0), you just take the 3rd column from the arcball rotation matrix.
I have camera that look at cube from above. I can rotate cube so cube can have rotation values like z=258.18594 x=1. I need advice how to get nearest rotation with cube stand on ground and camera see top face.
Look for Quaternion mathematics and Quaternion interpolation. This will make the task rather easy.
See also Nearest Neighbours using Quaternions