I want to display a model on image using Camera extrinsic matrix and gluLookAt function.
The model is translated to origin, that is, the model's center of mass is at origin. (model's coordinates is based on right-hand)
And using cvFindExtrinsicCameraParams2 function, i got camera extrinsic matrix E = [R|t].
For this case, i'd like to display cad model using gluLookat.
It has three parameters ; camera position, camera eye, camera up.
What values that i have to enter?
I guess, camera position is t : extrinsic matrix's translation values.
Also, if rotation and translation are zero, then camera see model through (0,0,1) vector. Thus, if rotation exists, camera eye should be R*(0,1,0).
Finally camera up, it should be (0,-1,0) at first if camera looks model at the front. Then new camera up vector is R * (0,-1,0).
But it does not give me a correct result. What's the problem? What's my mistake?
The eye is a point in space at which the camera is looking. What you currently calculate is the direction in which it should look. You can, for example, use
eye = t + R * [0,0,1];
I'm wondering why you try to recreate the camera matrix using glLookAt, since the result should be exactly the extrinsic camera matrix that you already have.
Related
Say I have 2 objects, a camera and a cube, both on XZ plane, the cube has some arbitrary rotation, and camera is facing the cube.
now if a transformation R is applied to the camera such that it has a new rotation and position.
I want to move the cube in front of the camera using transformation R1, such that in the view it looks exactly as before R was applied, meaning relative distance, rotation and scale b/w the 2 objects remain same after both R and R1.
Following image gives a gist of the problem.
Assume that there's no scenegraph that we can use.
I've posed the problem mainly in 2D but I'm trying to solve it in 3D, so rotations can have all yaw, pitch and roll, and translations can be anywhere in 3D space.
EDIT:
I forgot to add what I have done so far.
I figured out how to maintain relative distance b/w camera and cube, I can project cube's position to get world to screen point, then unproject the screen point in new camera position to get new world position.
However for rotation, I have tried this
I thought I can apply same rotation as R in R1, this didn't work, it appears to work if rotation happens only in one axis, if rotation happens in more than one axes, it does not work.
I thought I can take delta rotation b/w camera and cube, and simply apply camera's rotation to the cube and then multiply delta rotation, this also didn't work
Let M and V be the model and view matrices before you move the camera, M2 and V2 be the matrices after you move the camera. To be clear: model matrix transforms the coordinates from object local coordinates into world coordinates; a view matrix transforms from world coordinates into clip-space camera coordinates. Consequently V*M*p transforms the position p into clip-space.
For the position on the screen to stay constant, we need V*M*p = V2*M2*p to be true for all p (that's assuming that the FOV doesn't change). Therefore V*M = V2*M2, or
M2 = inverse(V2)*V*M
If you apply the camera transformation on the right (V2 = V*R) then the above expression for M2 can be simplified:
M2 = inverse(R)*M
(that is you apply inverse(R) on the left of the model matrix to compensate).
Alternatively, ask yourself if you really need to keep the object coordinates in the world reference frame. It may be easier to not to apply the view matrix when rendering that object at all; that would effectively keep it relative to the camera at all times without any additional tweaks. That would have better numerical stability too.
I am trying to understand the pinhole camera model and the geometry behind some computer vision and camera calibration stuff that I am looking at.
So, if I understand correctly, the pinhole camera model maps the pixel coordinates to 3D real world coordinates. So, the model looks as:
y = K [R|T]x
Here y is pixel coordinates in homogeneous coordinates, R|T represent the extrinsic transformation matrix and x is the 3D world coordinates also in homogeneous coordinates.
Now, I am looking at a presentation which says
project the center of the focus region onto the ground plane using [R|T]
Now the center of the focus region is just taken to be the center of the image. I am not sure how I can estimate the ground plane? Assuming, the point to be projected is in input space, the projection should that be computed by inverting the [R|T] matrix and multiplying that point by the inverted matrix?
EDIT
Source here on page 29: http://romilbhardwaj.github.io/static/BuildSys_v1.pdf
I am trying to create a dataset of images of objects at different poses, where each image is annotated with camera pose (or object pose).
If, for example, I have a world coordinate system and I place the object of interest at the origin and place the camera at a known position (x,y,z) and make it face the origin. Given this information, how can I calculate the pose (rotation matrix) for the camera or for the object.
I had one idea, which was to have a reference coordinate i.e. (0,0,z') where I can define the rotation of the object. i.e. its tilt, pitch and yaw. Then I can calculate the rotation from (0,0,z') and (x,y,z) to give me a rotation matrix. The problem is, how to now combine the two rotation matrices?
BTW, I know the world position of the camera as I am rendering these with OpenGL from a CAD model as opposed to physically moving a camera around.
The homography matrix maps between homogeneous screen coordinates (i,j) to homogeneous world coordinates (x,y,z).
homogeneous coordinates are normal coordinates with a 1 appended. So (3,4) in screen coordinates is (3,4,1) as homogeneous screen coordinates.
If you have a set of homogeneous screen coordinates, S and their associated homogeneous world locations, W. The 4x4 homography matrix satisfies
S * H = transpose(W)
So it boils down to finding several features in world coordinates you can also identify the i,j position in screen coordinates, then doing a "best fit" homography matrix (openCV has a function findHomography)
Whilst knowing the camera's xyz provides helpful info, its not enough to fully constrain the equation and you will have to generate more screen-world pairs anyway. Thus I don't think its worth your time integrating the cameras position into the mix.
I have done a similar experiment here: http://edinburghhacklab.com/2012/05/optical-localization-to-0-1mm-no-problemo/
I am kind of confused by the camera to world vs. world to camera transformation.
In the openGL rendering pipeline, The transformation is from the world to the camera, right?
Basically, a view coordinate frame is constructed at the camera, then the object in the world is first translated relative to the camera and then rotated with the camera coordinate frame. is this what gluLookat is performing?
If I want to go from camera view back to world view, how it should be done?
Mathematically, I am thinking finding the inverse of the translate and rotate matrices, then apply the rotation before the translate, right?
Usually there are several transformations that map world positions to the screen. The following ones are the most common ones:
World transformation: Can be applied to objects in order to realign them relatively to other objects.
View transformation: After this transformation the camera is at O and looks in the z direction.
Projection transformation: Performs e.g. perspective transformations to simulate a real camera
Viewport adaption: This is basically a scaling and translation that maps the positions from range [-1, 1] to the viewport. This is usually the screen width and height.
gluLookAt is used to create a view transformation. You can imagine it as follows: Place the camera somewhere in your scene. Now transform the whole scene (with the camera) so that the camera is at the origin, it faces in the z direction and the y axis represents the up direction. This is a simple rigid body transformation that can be represented as an arbitrary rotation (with three degrees of freedom) and an arbitrary translation (with another three degrees of freedom). Every rigid body transformation can be split into separate rotations and translation. Even the sequence of evaluation can vary, if you choose the correct values. Transformations can be interpreted in different ways. I wrote a blog entry on that topic a while ago. If you're interested, take a look at it. Although it is for DirectX, the maths is pretty much the same for OpenGL. You just have to watch out for transposed matrices.
For the second question: Yes, you are right. You need to find the inverse transformation. This can be easily achieved with the inverse matrix. If you specified the view matrix V as follows:
V = R_xyz * T_xyz
then the inverse transformation V^-1 is
V^-1 = T_xyz^-1 * R_xyz^-1
However, this does not map screen positions to world positions because there is more transformation going on. I hope, that answers your questions.
Here is another interesting point. The view matrix is the inverse of the transformation that would align a camera model (at the origin, facing in z direction) at the specified position. This relation is called system transformation vs. model transformation.
In OpenGL I'm trying to create a free flight camera. My problem is the rotation on the Y axis. The camera should always be rotated on the Y world axis and not on the local orientation. I have tried several matrix multiplications, but all without results. With
camMatrix = camMatrix * yrotMatrix
rotates the camera along the local axis. And with
camMatrix = yrotMatrix * camMatrix
rotates the camera along the world axis, but always around the origin. However, the rotation center should be the camera. Somebody an idea?
One of the more tricky aspects of 3D programming is getting complex transformations right.
In OpenGL, every point is transformed with the model/view matrix and then with the projection matrix.
the model view matrix takes each point and translates it to where it should be from the point of view of the camera. The projection matrix converts the point's coordinates so that the X and Y coordinates can be mapped to the window easily.
To get the mode/view matrix right, you have to start with an identity matrix (one that doesn't change the vertices), then apply the transforms for the camera's position and orientation, then for the object's position and orientation in reverse order.
Another thing you need to keep in mind is, rotations are always about an axis that is centered on the origin (0,0,0). So when you apply a rotate transform for the camera, whether you are turning it (as you would turn your head) or orbiting it around the origin (as the Earth orbits the Sun) depends on whether you have previously applied a translation transform.
So if you want to both rotate and orbit the camera, you need to:
Apply the rotation(s) to orient the camera
Apply translation(s) to position it
Apply rotation(s) to orbit the camera round the origin
(optionally) apply translation(s) to move the camera in its set orientation to move it to orbit around a point other than (0,0,0).
Things can get more complex if you, say, want to point the camera at a point that is not (0,0,0) and also orbit that point at a set distance, while also being able to pitch or yaw the camera. See here for an example in WebGL. Look for GLViewerBase.prototype.display.
The Red Book covers transforms in much more detail.
Also note gluLookAt, which you can use to point the camera at something, without having to use rotations.
Rather than doing this using matrices, you might find it easier to create a camera class which stores a position and orthonormal n, u and v axes, and rotate them appropriately, e.g. see:
https://github.com/sgolodetz/hesperus2/blob/master/Shipwreck/MapEditor/GUI/Camera.java
and
https://github.com/sgolodetz/hesperus2/blob/master/Shipwreck/MapEditor/Math/MathUtil.java
Then you write things like:
if(m_keysDown[TURN_LEFT])
{
m_camera.rotate(new Vector3d(0,0,1), deltaAngle);
}
When it comes time to set the view for the camera, you do:
gl.glLoadIdentity();
glu.gluLookAt(m_position.x, m_position.y, m_position.z,
m_position.x + m_nVector.x, m_position.y + m_nVector.y, m_position.z + m_nVector.z,
m_vVector.x, m_vVector.y, m_vVector.z);
If you're wondering how to rotate about an arbitrary axis like (0,0,1), see MathUtil.rotate_about_axis in the above code.
If you don't want to transform based on the camera from the previous frame, my suggestion might be just to throw out the matrix compounding and recalc it every frame. I don't think there's a way to do what you want with a single matrix, as that stores the translation and rotation together.
I guess if you just want a pitch/yaw camera only, just store those values as two floats, and then rebuild the matrix based on that. Maybe something like pseudocode:
onFrameUpdate() {
newPos = camMatrix * (0,0,speed) //move forward along the camera axis
pitch += mouse_move_x;
yaw += mouse_move_y;
camMatrix = identity.translate(newPos)
camMatrix = rotate(camMatrix, (0,1,0), yaw)
camMatrix = rotate(camMatrix, (1,0,0), pitch)
}
rotates the camera along the world axis, but always around the origin. However, the rotation center should be the camera. Somebody an idea?
I assume matrix stored in memory this way (number represent element index if matrix were a linear 1d array):
0 1 2 3 //row 0
4 5 6 7 //row 1
8 9 10 11 //row 2
12 13 14 15 //row 3
Solution:
Store last row of camera matrix in temporary variable.
Set last row of camera matrix to (0, 0, 0, 1)
Use camMatrix = yrotMatrix * camMatrix
Restore last row of camera matrix from temporary variable.