Are the elements of the Pose Matrix in Kitti Odometry dataset absolute or relative? - computer-vision

I am working with Kitti Odometry dataset. I know that poses.txt file's each row represents a flattened version of 4*4 Pose matrix such that:
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
0 0 0 1
My question is that are these values relative or absolute ? In other words for example is tx the relative translation in the x axis with respect to the state at the prior time (t-1), or is it the absolute translation in the x axis with respect to the initial position at time t=0 ?

The KITTI dataset poses are absolute position respect to the initial position.(By default an identity matrix 4*4)
if you working on VO/SLAM tasks you need convert them to relative positions.
You can find more details in my previous answer:
How to evaluate Monocular Visual Odometry results used the KITTI odometry dataset
And also KITTI dataset site:
Visual Odometry / SLAM Evaluation 2012
Best regards.


openCv undistort using a 2 parameter distortion coefficient model

I am attempting to undistort some pre recorded footage, where I no longer have access to the camera. It was recorded with a camera system that calibrated and stored the intrinsic data. So far so good.
BUT, although the values for creating the camera matrix are as expected, the distortion coefficient values are based on a 2 parameter model.
Therefore the data I have is:
camera matrix:
0, 0,1
distortion coeff: (from the calibration file)
RADIAL="2.58565363397419e-007 1.56049733948831e-013"
opencv undistort function needs a 1 x 5 matrix, it seems, so I am stuck. How can I create this from 2 values?
Any help greatly appreciated!

ITU-R 2k filter implementation

I have an array coming from a digitizer. I do an fft on it and then I calculate the frequency bins and apply a 20kHz low pass filter. The next step would be to apply an ITU-R 2k filter on this array and the filter behaves like the curve in the picture. I know I am supposed to do a multiplication one by one of the samples but I am not sure how to start with it. I know the 0 dB point is at 2 kHz and the maximum of 6 dB is located at 7 kHz. The implementation has to done in C++.
itu-r 468 filter behavior
An LTI filter like this is a straightforward multiplication in the frequency domain. Put the filter coefficients in an array of the same length, multiply the two: std::transform(std::begin(fftbins), std::end(fftbins), std::begin(filtercoeff), std::multiplies<std::complex<double>>()); and perform the IFFT.

How rectify an image from a single calibrated camera using Matlab toolbox [duplicate]

I'm using Matlab for camera calibration using Jean-
Yves Bouget's Camera Calibration Toolbox. I have all the camera
parameters from the calibration procedure. When I use a new image not
in the calibration set, I can get its transformation equation e.g.
Xc=R*X+T, where X is the 3D point of the calibration rig (planar) in
the world frame, and Xc its coordinates in the camera frame. In other
words, I have everything (both extrinsic and intrinsic parameters).
What I want to do is to perform perspective correction on this image
i.e. I want it to remove any perspective and see the calibration rig
undistorted (its a checkerboard).
Matlab's new Computer Vision toolbox has an object that performs a perspective transformation on an
image, given a 3X3 matrix H. The problem is, I can't compute this
matrix from the known intrinsic and extrinsic parameters!
To all who are still interested in this after so many months, i've managed to get the correct homography matrix using Kovesi's code (, and especially the homography2d.m function. You will need however the pixel values of the four corners of the rig. If the camera is steady fixed, then you will need to do this once. See example code below:
%get corner pixel coords from base image
por=[p1 p2 p3 p4];
por=[0 1 0;1 0 0;0 0 1]*por; %swap x-y <--------------------
%calculate target image coordinates in world frame
% rig is 9x7 (X,Y) with 27.5mm box edges
XXw=[[0;0;0] [0;27.5*9;0] [27.5*7;27.5*9;0] [27.5*7;0;0]];
Rtarget=[0 1 0;1 0 0;0 0 -1]; %Rotation matrix of target camera (vertical pose)
XXc=Rtarget*XXw+Tc_ext*ones(1,4); %go from world frame to camera frame
xn=XXc./[XXc(3,:);XXc(3,:);XXc(3,:)]; %calculate normalized coords
xpp=KK*xn; %calculate target pixel coords
% get homography matrix from original to target image
%do perspective transformation to validate homography
That should do the trick. Note that Matlab defines the x axis in an image ans the rows index and y as the columns. Thus one must swap x-y in the equations (as you'll probably see in the code above). Furthermore, i had managed to compute the homography matrix from the parameters solely, but the result was slightly off (maybe roundoff errors in the calibration toolbox). The best way to do this is the above.
If you want to use just the camera parameters (that is, don't use Kovesi's code), then the Homography matrix is H=KK*Rmat*inv_KK. In this case the code is,
% corner coords in pixels
pmat=[p1 p2 p3 p4];
pmat=[0 1 0;1 0 0;0 0 1]*pmat; %swap x-y
R=[0 1 0;1 0 0;0 0 1]; %rotation matrix of final camera pose
Rmat=Rc_ext'*R; %rotation from original pose to final pose
H=KK*Rmat*inv_KK; %homography matrix
pnew=H*pmat./[H(3,:)*pmat;H(3,:)*pmat;H(3,:)*pmat]; %do perspective transformation
H2=[0 1 0;-1 0 0;0 0 1]*H; %swap x-y in the homography matrix to apply in image
Approach 1:
In the Camera Calibration Toolbox you should notice that there is an H matrix for each image of your checkerboard in your workspace. I am not familiar with the computer vision toolbox yet but perhaps this is the matrix you need for your function. It seems that H is computed like so:
KK = [fc(1) fc(1)*alpha_c cc(1);0 fc(2) cc(2); 0 0 1];
H = KK * [R(:,1) R(:,2) Tc]; % where R is your extrinsic rotation matrix and Tc the translation matrix
H = H / H(3,3);
Approach 2:
If the computer vision toolbox function doesn't work out for you then to find the prospective projection of an image I have used the interp2 function like so:
[X, Y] = meshgrid(0:size(I,2)-1, 0:size(I,1)-1);
im_coord = [X(:), Y(:), ones(prod(size(I_1)))]';
% Insert projection here for X and Y to XI and YI
ZI = interp2(X,Y,Z,XI,YI);
I have used prospective projections on a project a while ago and I believe that you need to use homogeneous coordinates. I think I found this wikipedia article quite helpful.

3x3 Matrix Rotation in C++

Alright, first off, I know similar questions are all over the web, I have looked at more than I'd care to count, I've been trying to figure it out for almost 3 weeks now (not constantly, just on and off, hoping for a spark of insight).
In the end, what I want to get, is a function where you pass in how much you want to rotate by (currently I'm working in Radian's, but I can go Degrees or Radians) and it returns the rotation matrix, preserving any translations I had.
I understand the formula to rotate on the "Z" axis in a 2D cartesian plane, is:
[cos(radians) -sin(radians) 0]
[sin(radians) cos(radians) 0]
[0 0 1]
I do understand Matrix Maths (Addition, Subtraction, Multiplication and Determinant/Inverse) fairly well, but what I'm not understanding, is how to, step-by-step, make a matrix I can use for rotation, preserving any translation (and whatever else, like scale) that it has.
From what I've gathered from other examples, is to multiply my current Matrix (whatever that may be, let's just use an Identity Matrix for now), by a Matrix like this:
[cos(radians) - sin(radians)]
[sin(radians) + cos(radians)]
But then my original Matrix would end up as a 3x1 Matrix instead of a 3x3, wouldn't it? I'm not sure what I'm missing, but something just doesn't seem right to me. I'm not necessarily looking for code for someone to write for me, just to understand how to do this properly and then I can write it myself. (not to say I won't look at other's code :) )
(Not sure if it matters to anybody, but just in-case, using Windows 7 64-bit, Visual Studio 2010 Ultimate, and I believe OpenGL, this is for Uni)
While we're at it, can someone double check this for me? Just to make sure it seems right.
A translation Matrix (again, let's use Identity) is something like this:
[1, 0, X translation element]
[0, 1, Y translation element]
[0, 0, 1]
First, You can not have translation 3x3 matrix for 3D space. You have to use homogeneous 4x4 matrices.
After that create a separate matrix for each transformation (translation, rotation, scale) and multiply them to get the final transformation matrix (multiplying 4x4 matrix will give you 4x4 matrix)
Lets clear some points:
Your object consists of 3D points which are basically 3 by 1 matrices.
You need a 3 by 3 rotation matrix to rotate your object: R but if you also add translation terms, transformation matrix will be 4 by 4:
[R11, R12, R13 tx]
[R21, R22, R23 ty]
[R31, R32, R33 tz]
[0, 0, 0, 1]
For R terms you can have look at :, they are dependent on the rotation angles of each axis.
In order to rotate your object, every 3D point is multiplied by this rotation matrix. For every 3 by 1 point you also need to add a 4th term(scale factor) which is 1 assuming fixed scale:
[x y z 1]'
Resulting product vector will be 4 by 1 and the last term is the scale term which is 1 again and can be removed.
Resulting rotated object points are these new 3D product points.
I faced the same problem and found a satisfying formula in this SO question.
Let (cos0, sin0) be respectively the cosine and sine values of your angle, and (x0, y0) the coordinates of the center of your rotation.
To transform a 2d point of coordinates (x,y), you have to multiply its homogeneous 3x1 coordinates (x,y,1) by this 3x3 matrix:
[cos0, -sin0, x0-(cos0*x0 - sin0*y0)]
[sin0, cos0, y0-(sin0*x0 + cos0*y0)]
[ 0, 0, 1 ]
The values on the third column are the amount of translation necessary to apply when you rotation center is not the origin of the system.

How do you judge the (real world) distance of an object in a picture?

I am building a recognition program in C++ and to make it more robust, I need to be able to find the distance of an object in an image.
Say I have an image that was taken 22.3 inches away of an 8.5 x 11 picture. The system correctly identifies that picture in a box with the dimensions 319 pixels by 409 pixels.
What is an effective way for relating the actual Height and width (AH and AW) and the pixel Height and width (PH and PW) to the distance (D)?
I am assuming that when I actually go to use the equation, PH and PW will be inversely proportional to D and AH and AW are constants (as the recognized object will always be an object where the user can indicate width and height).
I don't know if you changed your question at some point but my first answer it quite complicated for what you want. You probably can do something simpler.
1) Long and complicated solution (more general problems)
First you need the know the size of the object.
You can to look at computer vision algorithms. If you know the object (its dimensions and shape). Your main problem is the problem of pose estimation (that is find the position of the object relative the camera) from this you can find the distance. You can look at [1] [2] (for example, you can find other articles on it if you are interested) or search for POSIT, SoftPOSIT. You can formulate the problem as an optimization problem : find the pose in order to minimize the "difference" between the real image and the expected image (the projection of the object given the estimated pose). This difference is usually the sum of the (squared) distances between each image point Ni and the projection P(Mi) of the corresponding object (3D) point Mi for the current parameters.
From this you can extract the distance.
For this you need to calibrate you camera (roughly, find the relation between the pixel position and the viewing angle).
Now you may not want do code all of this for by yourself, you can use Computer Vision libs such as OpenCV, Gandalf [3] ...
Now you may want to do something more simple (and approximate). If you can find the image distance between two points at the same "depth" (Z) from the camera, you can relate the image distance d to the real distance D with : d = a D/Z (where a is a parameter of the camera related to the focal length, number of pixels that you can find using camera calibration)
2) Short solution (for you simple problem)
But here is the (simple, short) answer : if you picture in on a plane parallel to the "camera plane" (i.e. it is perfectly facing the camera) you can use :
PH = a AH / Z
PW = a AW / Z
where Z is the depth of the plane of the picture and a in an intrinsic parameter of the camera.
For reference the pinhole camera model relates image coordinated m=(u,v) to world coordinated M=(X,Y,Z) with :
m ~ K M
[u] [ au as u0 ] [X]
[v] ~ [ av v0 ] [Y]
[1] [ 1 ] [Z]
[u] = [ au as ] X/Z + u0
[v] [ av ] Y/Z + v0
where "~" means "proportional to" and K is the matrix of intrinsic parameters of the camera. You need to do camera calibration to find the K parameters. Here I assumed au=av=a and as=0.
You can recover the Z parameter from any of those equations (or take the average for both). Note that the Z parameter is not the distance from the object (which varies on the different points of the object) but the depth of the object (the distance between the camera plane and the object plane). but I guess that is what you want anyway.
[1] Linear N-Point Camera Pose Determination, Long Quan and Zhongdan Lan
[2] A Complete Linear 4-Point Algorithm for Camera Pose Determination, Lihong Zhi and Jianliang Tang
If you know the size of the real-world object and the angle of view of the camera then assuming you know the horizontal angle of view alpha(*), the horizontal resolution of the image is xres, then the distance dw to an object in the middle of the image that is xp pixels wide in the image, and xw meters wide in the real world can be derived as follows (how is your trigonometry?):
# Distance in "pixel space" relates to dinstance in the real word
# (we take half of xres, xw and xp because we use the half angle of view):
(xp/2)/dp = (xw/2)/dw
dw = ((xw/2)/(xp/2))*dp = (xw/xp)*dp (1)
# we know xp and xw, we're looking for dw, so we need to calculate dp:
# we can do this because we know xres and alpha
# (remember, tangent = oposite/adjacent):
tan(alpha) = (xres/2)/dp
dp = (xres/2)/tan(alpha) (2)
# combine (1) and (2):
dw = ((xw/xp)*(xres/2))/tan(alpha)
# pretty print:
dw = (xw*xres)/(xp*2*tan(alpha))
(*) alpha = The angle between the camera axis and a line going through the leftmost point on the middle row of the image that is just visible.
Link to your variables:
dw = D, xw = AW, xp = PW
This may not be a complete answer but may push you in the right direction. Ever seen how NASA does it on those pictures from space? The way they have those tiny crosses all over the images. Thats how they get a fair idea about the deapth and size of the object as far as I know. The solution might be to have an object that you know the correct size and deapth of in the picture and then calculate the others' relative to that. Time for you to do some research. If thats the way NASA does it then it should be worth checking out.
I have got to say This is one of the most interesting questions i have seen for a long time on stackoverflow :D. I just noticed you have only two tags attached to this question. Adding something more in relation to images might help you better.