I am new to opencv. and I am right now going through with the concept of Image Transformation in OpenCV. So my question is,
1) Why does Affine Transformation use 2*3 matrix and perspective transformation use 3*3 matrix?
2) When to use Affine transformation and Perspective transformation over each other?
1) It is not a question about OpenCV but rather about mathematics. Applying affine transformation to point (x,y) means the following:
x_new = a*x + b*y + c;
y_new = d*x + e*y + f;
And so affine transform has 6 degrees of freedom: a, b, c, d, e, f. They are stored in 2x3 matrix: a, b, c in the first row, and d, e, f in the second row. You can apply transform to a point by multiplying of matrix and vector.
Perspective transform of (x,y) would be:
z = g*x + h*y + 1;
x_new = (a*x + b*y + c)/z;
y_new = (d*x + e*y + f)/z;
As you can see it has 8 degrees of freedom that are stored in 3x3 matrix. Third row is g, h, 1.
See also homogeneous coordinates for more information about why this representation is so convenient.
2) Affine transformation is also called 'weak perspective' transformation: if you are looking at some scene from different perspective but size of the scene is small relatively to distance to the camera (i.e. parallel lines remain more or less parallel), than you may use affine transform. Otherwise perspective transform will be required.

It is better to consider a hole family of transformations - then you really remember what is what. Let’s go from simplest to complex ones:
1. Euclidean - this is a rigid rotation in plane plus translation. Basically all you can do with a piece of paper lying on the table.
2. Similarity - more general transformation where you can rotate, translate and also scale (hence it is non-rigid);
3. Affine - adds another operation - shear - which would make a parallelogram from a rectangle. This kind of sheer happens during orthographic projection or when objects are viewed from a long distance (compared to their size); parallel lines are still preserved.
4. Homography or perspective transformation - most general transformation and it will make a trapezoid out of rectangle (that is different amount of shear applied to each side). This happens when projecting planar objects from close distance. Remember how train trucks converge to a point at infinity? hence the name perspective. It also means that unlike other transformations we have to apply a division at some point. This what a third row does when we convert from Homogeneous to Cartesian coordinates we divide by a value in a last third row.
This transformation is the only one that cannot be optimally computed using linear algebra and requires non-linear optimization (coz of devision). In camera projections homography happens in three cases:
1. between flat surface and its image;
2. between arbitrary images of 3D scene when camera rotates but not translates;
3. during zoom operation.
In other words whenever a flat camera sensor crosses the same optical rays you have a homography.


Mapping between different camera views

I have a calibrated (virtual) camera in Blender that views a roughly planar object. I make an image from a first camera pose P0 and move the camera to a new pose P1. So I have the 4x4 camera matrix for both views from which I can calculate the transformation between the cameras as given below. I also know the intrinsics matrix K. Using those, I want to map the points from the image for P0 to a new image seen from P1 (of course, I have the ground truth to compare because I can render in Blender after the camera has moved to P1). If I only rotate the camera between P0 and P1, I can calculate the homography perfectly. But if there is translation, the calculated homography matrix does not take that into account. The theory says, after calculating M10, the last row and column should be dropped for a planar scene. However, when I check M10, I see that the translation values are in the rightmost column, which I drop to get the 3x3 homography matrix H10. Then, if there is no rotation, H10 is equal to the identity matrix. What is going wrong here?
Edit: I know that the images are related by a homography because given the two images from P0 and P1, I can find a homography (by feature matching) that perfectly maps the image from P0 to the image from P1, even in presence of a translational camera movement.
The theory became more clear to me after reading from two other books: "Multiple View Geometry" from Hartley and Zissermann (Example 13.2) and particularly "An Invitation to 3-D Vision: From Images to Geometric Models" (Section 5.3.1, Planar homography). Below is an outline, please check the above-mentioned sources for a thorough explanation.
Consider two images of points p on a 2D plane P in 3D space, the transformation between the two camera frames can be written as: X2 = R*X1 + T (1) where X1 and X2 are the coordinates of the world point p in camera frames 1 and 2, respectively, R the rotation and T the translation between the two camera frames. Denoting the unit normal vector of the plane P to the first camera frame as N and the distance from the plane P to the first camera as d, we can use the plane equation to write N.T*X1=d (.T means transpose), or equivalently (1/d)*N.T*X1=1 (2) for all X1 on the plane P. Substituting (2) into (1) gives X2 = R*X1+T*(1/d)*N.T*X1 = (R+(1/d)*T*N.T)*X1. Therefore, the planar homography matrix (3x3) can be extracted as H=R+(1/d)*T*N.T, that is X2 = H*X1. This is a linear transformation from X1 to X2.
The distance d can be computed as the dot product between the plane normal and a point on the plane. Then, the camera intrinsics matrix K should be used to calculate the projective homography G = K * R+(1/d)*T*N.T * inv(K). If you are using a software like Blender or Unity, you can set the camera intrinsics yourself and thus obtain K. For Blender, there a nice code snippet is given in this excellent answer.
OpenCV has some nice code example in this tutorial; see "Demo 3: Homography from the camera displacement".

Compute Homography Matrix based on intrinsic and extrinsic camera parameters

I am willing to perform a 360° Panorama stitching for 6 fish-eye cameras.
In order to find the relation among cameras I need to compute the Homography Matrix. The latter is usually computed by finding features in the images and matching them.
However, for my camera setup I already know:
The intrinsic camera matrix K, which I computed through camera calibration.
Extrinsic camera parameters R and t. The camera orientation is fixed and does not change at any point. The cameras are located on a circle of known diameter d, being each camera positioned with a shift of 60° degrees with respect to the circle.
Therefore, I think I could manually compute the Homography Matrix, which I am assuming would result in a more accurate approach than performing feature matching.
In the literature I found the following formula to compute the homography Matrix which relates image 2 to image 1:
H_2_1 = (K_2) * (R_2)^-1 * R_1 * K_1
This formula only takes into account a rotation angle among the cameras but not the translation vector that exists in my case.
How could I plug the translation t of each camera in the computation of H?
I have already tried to compute H without considering the translation, but as d>1 meter, the images are not accurate aligned in the panorama picture.
Based on Francesco's answer below, I got the following questions:
After calibrating the fisheye lenses, I got a matrix K with focal length f=620 for an image of size 1024 x 768. Is that considered to be a big or small focal length?
My cameras are located on a circle with a diameter of 1 meter. The explanation below makes it clear for me, that due to this "big" translation among the cameras, I have remarkable ghosting effects with objects that are relative close to them. Therefore, if the Homography model cannot fully represent the position of the cameras, is it possible to use another model like Fundamental/Essential Matrix for image stitching?
You cannot "plug" the translation in: its presence along with a nontrivial rotation mathematically implies that the relationship between images is not a homography.
However, if the imaged scene is and appears "far enough" from the camera, i.e. if the translations between cameras are small compared to the distances of the scene objects from the cameras, and the cameras' focal lengths are small enough, then you may use the homography induced by a pure rotation as an approximation.
Your equation is wrong. The correct formula is obtained as follows:
Take a pixel in camera 1: p_1 = (x, y, 1) in homogeneous coordinates
Back project it into a ray in 3D space: P_1 = inv(K_1) * p_1
Decompose the ray in the coordinates of camera 2: P_2 = R_2_1 * P1
Project the ray into a pixel in camera 2: p_2 = K_2 * P_2
Put the equations together: p_2 = [K_2 * R_2_1 * inv(K_1)] * p_1
The product H = K2 * R_2_1 * inv(K1) is the homography induced by the pure rotation R_2_1. The rotation transforms points into frame 2 from frame 1. It is represented by a 3x3 matrix whose columns are the components of the x, y, z axes of frame 1 decomposed in frame 2. If your setup gives you the rotations of all the cameras with respect to a common frame 0, i.e. as R_i_0, then it is R_2_1 = R_2_0 * R_1_0.transposed.
Generally speaking, you should use the above homography as an initial estimation, to be refined by matching points and optimizing. This is because (a) the homography model itself is only an approximation (since it ignores the translation), and (b) the rotations given by the mechanical setup (even a calibrated one) are affected by errors. Using matched pixels to optimize the transformation will minimize the errors where it matters, on the image, rather than in an abstract rotation space.

How to flip only one axis of transformation matrix?

I have a 4x4 transformation matrix. However, after trying out the transformation I noticed that movement and rotation of the Y axis is going the opposite way. The rest is correct.
I got this matrix from some other API so probably it is the difference of coordinate system. So, how can I flip an axis of transformation matrix?
If only translation I can add minus sign on the Y translation, but I have no idea about opposite rotation of only one axis since all the rotation is being represented in the same 3x3 area. I thought there might be some way that even affect both translation and rotation at the same time. (truly flipping the axis)
Edit: I'm pretty sure the operation you're looking for is changing coordinate systems while maintaining Z-up or Y-up. In this case, try setting all the elements of the second column (or row) of your matrix to their inverse.
This question would be better for the Math StackExchange. First, a really helpful read on rotation matrices.
The first problem is the matter of rotation order. I will be assuming the XYZ rotation order. We know the rotation matrices for each axis is as follows:
Given a matrix derived from the same rotation order, the resulting matrix would be as follows, where alpha is the X angle, beta is the Y angle, and gamma is the Z angle:
You can derive the individual components of each axis angle from this matrix. For example, you can derive the Y angle from -sin(beta) using some inverse trig. Given beta, you can derive alpha from cos(beta)sin(alpha). You can also derive gamma from cos(beta)sin(gamma). Note that the same number in the matrix can represent multiple values (e.g. sin(0)=0 and sin(180)=0).
Now that you know alpha, beta, and gamma, you can reverse beta and remake the rotation matrix.
There's a good chance that there's a better way to do this using quaternions, but you should ask the Math StackExchange these kinds of language-agnostic questions.
Much shorter answer: if you are not careful with your frame orientation many things down your pipeline are likely to have a bad hair day. The reason is "parity", a.k.a. "frame orientation", a.k.a. "right-handedness" (or rarely left-handedness). Most 3D geometry tools and libraries that work together normally assume implicitly that all coordinate systems in play are right-handed (or at least consistently-handed). Inverting the orientation of just one axis in a coordinate system changes its orientation from right to left handed or viceversa.
So, suggestion for things to check & try in your problem:
Check that the frame you get from your API is right-handed. You do so
by computing the determinant of the 3x3 rotation part of your 4x4 transform matrix: it must be +1 or very close to it.
If it is -1, then flip one if its axis, i.e. change the sign of one of the columns of the 3x3 rotation.
Note carefully: I said "columns" because I assume that you apply a transform Q to a point x by multiplying as Q * x, x being a 4x1 column vector with the last component equal to one. If you use row vectors left-multiplied by Q you need flip a row.
If that determinant is +1, you have a bug someplace else.

calculate rotation matrix from 4 points

I would like to stick a moving car to a curvy terrarian. I can calculate the y coordinate (which is height in my case) for each wheel. These 4 points forms a plane. I don't know how to calculate the rotation matrix from these 4 pints so I can apply it to the car. So this is what I would like to achieve:
BTW I am using c++ and openGL.
Could anybody help me out here?
If you guarantee that all 4 points lie on one plane, then the problem is not that hard to solve: Let's call the points (A,B,C,D) and we define a up vector (UP = [0,1,0])
1) Calculate the plane normal (N)
N = normalize(cross(B-A, C-A));
2) Calculate the rotation axis (R)
R = normalize(cross(N,UP))
3) Calculate rotation angle (alpha)
alpha = dot(N, UP)
The resulting matrix is then the one that rotates around R by an angle of alpha. If your matrix library does not support creating rotation axis around arbitrary axis, you can find the form here.
Note, that there is a singularity when alpha is very small (N will then vanish), so you should only calculate the matrix if alpha is sufficiently large. It might also be that case that some of the vectors point to the opposite direction depending on the winding order in which the points are defined. In this case just switch the two parameters of the cross function.

Translating a Quaternion

(perhaps this is better for a math Stack Exchange?)
I have a chain composed of bones. Each bone has a with a tip and tail. The following code computes where its tip will be, given a rotation, and sets the next link in the chain's position appropriately:
// Quaternion is a hand-rolled class that works correctly (as far as I can tell.)
Quaternion quat = new Quaternion(getRotationAngleDegrees(), getRotation());
// figure out where the tip will be after applying the rotation
Vector3f rotatedTip = quat.applyRotationTo(tip);
// set the next bone's tail to be at this one's tip
This works if the rotation is supposed to occur around the origin of the object's coordinate system. But what if I want the rotation to occur around some other arbitrary point in the object? I'm not sure how to translate the quaternion. What is the best way to do it?
(I'm using JOGL / OpenGL.)
Dual quaternions are useful for expressing rigid spatial transformations (combined rotations and translations.)
Based on dual numbers (one of the Clifford algebras, d = a + e b where a, b are real and e is unequal to zero but e^2 = 0), dual quaternions, U + e V, can represent lines in space with U the unit direction quaternion and V the moment about a reference point. In this way, dual quaternion lines are very much like Pluecker lines.
While the quaternion transform Q V Q* (Q* is the quaternion conjugate of Q) is used to rotate a unit vector quaternion V about a point, a similar dual quaternion form can be used to apply to line a screw transform (the rigid rotation about an axis combined with a translation along the axis.)
Just as any rigid 2D transform can be resolved to a rotation about a point, any rigid 3D transform can be resolved to a screw.
For such power and expressiveness, dual quaternion references are thin, and the Wikipedia article is as good a place as any to start.
A quaternion is used specifically to handle a rotation factor, but does not include a translation at all.
Typically, in this situation, you'll want to apply a rotation to a point based on the "bone's" length, but centered at the origin. You can then translate post-rotation to the proper location in space.
Quaternions are generally used to represent rotations only; they cannot represent translations as well.
You need to convert your quaternion into a rotation matrix, insert it into the appropriate part of your standard OpenGL 4x4 matrix, and combine it with a translation in order to rotate about an arbitrary point.
4x4 rotation matrix:
[ r r r 0 ]
[ r r r 0 ] <- the r's are the 3x3 rotation matrix from the wiki article
[ r r r 0 ]
[ 0 0 0 1 ]
The Wikipedia page on forward kinematics points to this paper: Introduction to Homogeneous Transformations & Robot Kinematics.
Edit : This answer is wrong. It argues on 4x4 transformation matrices properties, which are not quaternions...
I might have got it wrong but to me (unlike some answers) a quaternion is indeed a tool to handle rotations and translations (and more). It is a 4x4 matrix where the last column represents the translation. Using matrix algebra, replace the 3-vector (x, y, z) by the 4-vector (x, y, z, 1) and compute the transformed vector by the matrix. You will find that values of the last column of the matrix will be added to the coordinates x, y, z of the original vector, as in a translation.
A 3x3 matrix for a 3D space represents a linear transformation (like rotation around the origin). You cannot use a 3x3 matrix for an affine transformation like a translation. So I understand simply the quaternions as a little "trick" to represent more kinds of transformations using matrix algebra. The trick is to add a fourth coordinate equal to 1 and to use 4x4 matrices. Because matrix algebra remains valid, you can combine space transformations by multiplying the matrices, which is indeed powerful.