Negative focal length in Camera Calibration Matrix - computer-vision

While trying to work with the ICL_NUIM dataset I have ran into some issues. The camera calibration provided on the site has the following values:
481.20, 0, 319.50
0, -480.00, 239.50
0, 0, 1
where:
fx = 481.20
fy = -480.00
cu = 319.50
cv = 239.50
I am struggling to intuitively understand how the fy can have a negative value.

The mathematical effect is simply that the image is vertically inverted.
This is the equivalent of the image as appears on the classical pinhole model, in the back plane of the camera box. In that case both the fx,fy would be negative, but you get the idea.

Related

kitti dataset camera projection matrix

I am looking at the kitti dataset and particularly how to convert a world point into the image coordinates. I looked at the README and it says below that I need to transform to camera coordinates first then multiply by the projection matrix. I have 2 questions, coming from a non computer vision background
I looked at the numbers from calib.txt and in particular the matrix is 3x4 with non-zero values in the last column. I always thought this matrix = K[I|0], where K is the camera's intrinsic matrix. So, why is the last column non-zero and what does it mean? e.g P2 is
array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
After applying projection into [u,v,w] and dividing u,v by w, are these values with respect to origin at the center of image or origin being at the top left of the image?
README:
calib.txt: Calibration data for the cameras: P0/P1 are the 3x4
projection
matrices after rectification. Here P0 denotes the left and P1 denotes the
right camera. Tr transforms a point from velodyne coordinates into the
left rectified camera coordinate system. In order to map a point X from the
velodyne scanner to a point x in the i'th image plane, you thus have to
transform it like:
x = Pi * Tr * X
Refs:
How to understand the KITTI camera calibration files?
Format of parameters in KITTI's calibration file
http://www.cvlibs.net/publications/Geiger2013IJRR.pdf
Answer:
I strongly recommend you read those references above. They may solve most, if not all, of your questions.
For question 2: The projected points on images are with respect to origin at the top left. See ref 2 & 3, the coordinates of a far 3d point in image are (center_x, center_y), whose values are provided in the P_rect matrices. Or you can verify this with some simple codes:
import numpy as np
p = np.array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
x = [0, 0, 1E8, 1] # A far 3D point
y = np.dot(p, x)
y[0] /= y[2]
y[1] /= y[2]
y = y[:2]
print(y)
You will see some output like:
array([6.018873e+02, 1.831104e+02 ])
which is quite near the (p[0, 2], p[1, 2]), a.k.a. (center_x, center_y).
For all the P matrices (3x4), they represent:
P(i)rect = [[fu 0 cx -fu*bx],
[0 fv cy -fv*by],
[0 0 1 0]]
Last column are baselines in meters w.r.t. the reference camera 0. You can see the P0 has all zeros in the last column because it is the reference camera.
This post has more details:
How Kitti calibration matrix was calculated?

how to calculate field of view of the camera from camera intrinsic matrix?

I got camera intrinsic matrix and distortion parameters using camera calibration.
The unit of the focal length is pixels, i guess.
Then, how can i calculate field of view (along y) ?
Is this formula right?
double fov_y = 2*atan(height/2/fy)*180/CV_PI;
I'll use it to parameters of
gluPerspective()
OpenCV has a function that does this. Looking at the implementation (available on GitHub) we have given an image with dimensions w x h and a camera matrix:
the equations for the field of view are:
In continuation of #mallwright's answer, here is a bit of Python/numpy code to compute the field of view from the image resolution and focal lengths (in pixels):
import numpy as np
# Prepare
w, h = 1280, 720
fx, fy = 1027.3, 1026.9
# Go
fov_x = np.rad2deg(2 * np.arctan2(w, 2 * fx))
fov_y = np.rad2deg(2 * np.arctan2(h, 2 * fy))
print("Field of View (degrees):")
print(f" {fov_x = :.1f}\N{DEGREE SIGN}")
print(f" {fov_y = :.1f}\N{DEGREE SIGN}")
Output:
Field of View (degrees):
fov_x = 63.8°
fov_y = 38.6°
Note that this assumes that the principal point is at the center of the image and that there is no distortion, see this answer.

Get camera matrix from OpenGL

I render a 3D mesh model using OpenGL with perspective camera – gluPerspective(fov, aspect, near, far).
Then I use rendered image in a computer vision algorithm.
At some point that algorithm requires camera matrix K (along with several vertices on the model and their corresponding projections) in order to estimate camera position: rotation matrix R and translation vector t. I can estimate R and t by using any algorithm which solves Perspective-n-Point problem.
I construct K from the OpenGL projection matrix (see how here)
K = [fX, 0, pX | 0, fY, pY | 0, 0, 1]
If I want to project a model point 'by hand' I can compute:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] / X_proj[3]
y_pixel = X_proj[2] / X_proj[3]
Anyway, I pass this camera matrix in a PnP algorithm and it works just fine.
But then I had to change perspective projection to orthographic one.
As far as I understand when using orthographic projection the camera matrix becomes:
K = [1, 0, 0 | 0, 1, 0 | 0, 0, 0]
So I changed gluPerspective to glOrtho. Following the same way I constructed K from OpenGL projection matrix, and it turned out that fX and fY are not ones but 0.0037371. Is this a scaled orthographic projection or what?
Moreover, in order to project model vertices 'by hand' I managed to do the following:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] + width / 2
y_pixel = X_proj[2] + height / 2
Which is not what I expected (that plus width and hight divided by 2 seems strange...). I tried to pass this camera matrix to POSIT algorithm to estimate R and t, and it doesn't converge. :(
So here are my questions:
How to get orthographic camera matrix from OpenGL?
If the way I did it is correct then is it true orthographic? Why POSIT doesn't work?
Orthographic projection will not use the depth to scale down farther points. Though, it will scale the points to fit inside the NDC which means it will scale the values to fit inside the range [-1,1].
This matrix from Wikipedia shows what this means:
So, it is correct to have numbers other than 1.
For your way of computing by hand, I believe it's not scaling back to screen coordinates and that makes it wrong. As I said, the output of projection matrices will be in the range [-1,1], and if you want to get the pixel coordinates, I believe you should do something similar to this:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1]*width/2 + width / 2
y_pixel = X_proj[2]*height/2 + height / 2
Anyway, I think you'd be better if you used modern OpenGL with libraries like GLM. In this case, you have the exact projection matrices used at hand.

How to find an Equivalent point in a Scaled down image?

I would like to calculate the corner points or contours of the star in this in a Larger image. For that I'm scaling down the size to a smaller one & I'm able to get this points clearly. Now How to map this point in original image? I'm using opencv c++.
Consider a trivial example: the image size is reduced exactly by half.
So, the cartesian coordinate (x, y) in the original image becomes coordinate (x/2, y/2) in the reduced image, and coordinate (x', y') in the reduced image corresponds to coordinate (x*2, y*2) in the original image.
Of course, fractional coordinates get typically rounded off, in a reduced scale image, so the exact mapping is only possible for even-numbered coordinates in this example's original image.
Generalizing this, if the image's width is scaled by a factor of w horizontally and h vertically, coordinate (x, y) becomes coordinate(x*w, y*h), rounded off. In the example I gave, both w and h are 1/2, or .5
You should be able to figure out the values of w and h yourself, and be able to map the coordinates trivially. Of course, due to rounding off, you will not be able to compute the exact coordinates in the original image.
I realize this is an old question. I just wanted to add to Sam's answer above, to deal with "rounding off", in case other readers are wondering the same thing I faced.
This rounding off becomes obvious for even # of pixels across a coordinate axis. For instance, along a 1-D axis, a point demarcating the 2nd quartile gets mapped to an inaccurate value:
axis_prev = [0, 1, 2, 3]
axis_new = [0, 1, 2, 3, 4, 5, 6, 7]
w_prev = len(axis_prev) # This is an axis of length 4
w_new = len(axis_new) # This is an axis of length 8
x_prev = 2
x_new = x_prev * w_new / w_prev
print(x_new)
>>> 4
### x_new should be 5
In Python, one strategy would be to linearly interpolate values from one axis resolution to another axis resolution. Say for the above, we wish to map a point from the smaller image to its corresponding point of the star in the larger image:
import numpy as np
from scipy.interpolate import interp1d
x_old = np.linspace(0, 640, 641)
x_new = np.linspace(0, 768, 769)
f = interp1d(x_old, x_new)
x = 35
x_prime = f(x)

How can I calculate camera position by comparing two photographs?

I'm trying to calculate the cameras position for an image. I have 2 images of a rubiks cube. The first image is considered to be the base image and the next image is the image after the camera has moved. So for the first image I assume that the camera is at (0,0,0). On this image I then identify the 4 corners of the front face of the rubiks cube as shown here (4 corners identified by the 4 blue circles).
Then for the next image (after camera movement), I identify the same face of the rubiks cube as show here
So by assuming the first image as the base image, does anyone know if/how i can calculate how much the camera has moved for image 2 as shown here:
I would suggest you use OpenCV for this. I also think, this question would be more suited to StackOverflow.
The textbook on this subject would be "Multiple-View Geometry" by Hartley and Zisserman. http://www.robots.ox.ac.uk/~vgg/hzbook/ (There is a sample chapter on the Fundamental Matrix on that website.)
Basically, first find the Fundamental Matrix, then by knowing the intrinsic parameters of the camera, find a solution to the position.
Fundamental Matrix: http://en.wikipedia.org/wiki/Fundamental_matrix_%28computer_vision%29
Intrinsic Parameters: Stuff like the focal length and where the principal point is on the image plane. If you have F, then E = K^t * F * K, if K is the intrinsic matrix and the same for both images.
How to find a solution to the camera position: http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E
Algorithm
This is how I would do it in OpenCV. I have done this before, so it ought to work.
1. Run Feature Detection and Detector Extractor on both images.
2. Match Features.
3. Use F = cv::findFundamentalMatrix with Ransac.
4. E = K.t() * F * K. // K needs to be found beforehand.
5. Do SingularValueDecomposition of E such that E = U * S * V.t()
6. R = U * W.inv() * V.t() // W = [[0, -1, 0], [1, 0, 0], [0, 0, 1]]
7. Tx = V * Z * V.t() // Z = [[0, -1, 0], [1, 0, 0], [0, 0, 0]]
8. get t from Tx (matrix version of cross product)
9. Find the correct solution. R.t() and -t are possiblities.
10. Get overall scale by knowing the length of the size of the Rubrik's cube.
Alternative Solutions
I am certain that a more straightforward approach can also work. The benefit of this approach is that no human input is needed (unsupervised). This is not true for the optional step 10 (determining scale).
A different solution would exploit the knowledge of the geometry of the Rubrik's cube. For example, six (5.5) points are needed to estimate the position of the camera, if the point's 3D position is known.
Unfortunatly, I am not aware of any software that does this for you automatically.
So here is the alternative algorithm:
Write down the coordinates of the corners of the cube as (X_i, Y_i, Z_i), and possibly also points with other knowable positions.
Mark the corresponding points u_i = (x_i, y_i).
For every correspondence create two lines in a matrix A.
(X_i, Y_i, Z_i, 1, 0, 0, 0, 0, -x_iX_i, -x_iY_i, -x_iZ_i -x_i)
(0, 0, 0, 0, X_i, Y_i, Z_i, 1, -y_iX_i, -y_iY_i, -y_iZ_i -y_i)
Then find p such that Ap = 0. I.e. p is the right kernel of A, or the least-squared solution to Ap=0.
De-flatten p, to create a 3x4 matrix. P.