Panoramic Image Photogrametry: How to calculate range? - c++

Assume that I took two panoramic image with vertical offset of H and each image is presented in equirectangular projection with size Xm and Ym. To do this, I place my panoramic camera at position say A and took an image, then move camera H meter up and took another image.
I know that a point in image 1 with coordinate of X1,Y1 is the same point on image 2 with coordinate X2 and Y2(assuming that X1=X2 as we have only vertical offset).
My question is that How I can calculate the range of selected of point (the point that know its X1and Y1 is on image 1 and its position on image 2 is X2 and Y2 from the Point A (where camera was when image no 1 was taken.).

Yes, you can do it - hold on!!!
Key thing y = focal length of your lens - now I can do it!!!
So, I think your question can be re-stated more simply by saying that if you move your camera (on the right in the diagram) up H metres, a point moves down p pixels in the image taken from the new location.
Like this if you imagine looking from the side, across you taking the picture.
If you know the micron spacing of the camera's CCD from its specification, you can convert p from pixels to metres to match the units of H.
Your range from the camera to the plane of the scene is given by x + y (both in red at the bottom), and
x=H/tan(alpha)
y=p/tan(alpha)
so your range is
R = x + y = H/tan(alpha) + p/tan(alpha)
and
alpha = tan inverse(p/y)
where y is the focal length of your lens. As y is likely to be something like 50mm, it is negligible, so, to a pretty reasonable approximation, your range is
H/tan(alpha)
and
alpha = tan inverse(p in metres/focal length)
Or, by similar triangles
Range = H x focal length of lens
--------------------------------
(Y2-Y1) x CCD photosite spacing
being very careful to put everything in metres.

Here is a shot in the dark, given my understanding of the problem at hand you want to do something similar to computer stereo vision, I point you to http://en.wikipedia.org/wiki/Computer_stereo_vision to start. Not sure if this is still possible to do in the manner you are suggesting, it sounds like you may need some more physical constraints but I do remember being able to correlate two 2d points in images after undergoing a strict translation. Think :
lambda[x,y,1]^t = W[r1, tx;r2, ty;ry, tz][x; y; z; 1]^t
Where lamda is a scale factor, W is a 3x3 matrix covering the intrinsic parameters of your camera, r1, r2, and r3 are row vectors that make up the 3x3 rotation matrix (in your case you can assume the identity matrix since you have only applied a translation), and tx, ty, tz which are your translation components.
Since you are looking at two 2d points at the same 3d point [x,y,z] this 3d point is shared by both 2d points. I cannot say if you can rationalize the actual x,y, and z values particularly for your depth calculation but this is where I would start.

Related

Estimate the camera pose in the reference system using one marker with ARUCO

I am currently working on a camera pose estimation project using only one marker with ARUCO.
I used Aruco's Marker Detector to detect markers and get the marker's Rvec and Tvec. I understand these two vectors represent the transform from the marker to the camera, which is the marker's pose w.r.t camera. I form a 4 by 4 matrix called T_marker_camera using these two vectors.
Then, I set up a world frame (left handed) and get the marker's world pose, which is a 4 by 4 transform matrix.
I want to calculate the pose of the camera w.r.t the world frame, and I use the following formula to calculate it:
T_camera_world = T_marker_world * T_marker_camera_inv
Before I perform the above formula, I convert the OpenCV coordinates to the left handed one (flip the sign of x axis).
However, I didn't get the correct x, y, z of the camera w.r.t the world frame.
What did I miss to get the correct answer?
Thanks
The one equation you gave looks right, so the issue is probably somewhere that you didn't show/describe.
A fix in your notation will help clarify.
Write the pose/source frame on the right (input), the reference/destination frame on the left (output). Then your matrices "match up" like dominos.
rvec and tvec yield a matrix that should be called T_cam_marker.
If you want the pose of your camera in the world frame, that is
T_world_cam = T_world_marker * T_marker_cam
T_world_cam = T_world_marker * inv(T_cam_marker)
(equivalent to what you wrote, but domino)
Be sure that you do matrix multiplication, not element-wise multiplication.
To move between left-handed and right-handed coordinate systems, insert a matrix that maps coordinates accordingly. Frames:
OpenCV camera/screen: right-handed, {X right, Y down, Z far}
ARUCO (in OpenCV anyway): right-handed, {X right, Y far, Z up}, first corner is top left (-X+Y quadrant)
whatever leftie frame you have, let's say {X right, Y up, Z far} and it's a screen or something
The hand-change matrix for typical frames on screens is an identity but with the entry for Y being a -1. I don't know why you would flip the X but that's "equivalent", ignoring any rotations.

Get known position in one image to another using 8-point algorithm

I have two images and and know the position of a point in the first image. Now I want to get the corresponding position in the second image.
This is my idea:
I can use algorithms such as SIFT to match keypoints (as seen in the image)
I know the camera matrix using calibration with e.g. chessboards
Using the 8 point algorithm I calculate the fundamental matrix F
Can I now use F to calculate the corresponding point?
Using fundamental matrix F alone is not enough. If you have a point on one image, you can't find its position on the second image, because it depends not only on configuration of the cameras, but also on the distance from the camera to that point.
This can also be seen from the equation x2^T * F * x1 = 0. If you know x1 and F, then for x2 you get equation x2^T * b = 0, where b = F * x1. This is an equation of a point x2 lying on the line b (points x1, x2 and line b are in homogeneous coordinates). Although you cant find the exact position of the point on the second image, you know that it must lie somewhere on that line.
Hartley and Zisserman have a great explanation these of these concepts in their book Multiple View Geometry in Computer Vision. Be sure to check it out for more details.

How to create point cloud from rgb & depth images?

For a university project I am currently working on, I have to create a point cloud by reading images from this dataset. These are basically video frames, and for each frame there is an rgb image along with a corresponding depth image.
I am familiar with the equation z = f*b/d, however I am unable to figure out how the data should be interpreted. Information about the camera that was used to take the video is not provided, and also the project states the following:
"Consider a horizontal/vertical field of view of the camera 48.6/62
degrees respectively"
I have little to no experience in computer vision, and I have never encountered 2 fields of view being used before. Assuming I use the depth from the image as is (for the z coordinate), how would I go about calculating the x and y coordinates of each point in the point cloud?
Here's an example of what the dataset looks like:
Yes, it's unusual to specify multiple fields of view. Given a typical camera (squarish pixels, minimal distortion, view vector through the image center), usually only one field-of-view angle is given -- horizontal or vertical -- because the other can then be derived from the image aspect ratio.
Specifying a horizontal angle of 48.6 and a vertical angle of 62 is particularly surprising here, since the image is a landscape view, where I'd expect the horizontal angle to be greater than the vertical. I'm pretty sure it's a typo:
When swapped, the ratio tan(62 * pi / 360) / tan(48.6 * pi / 360) is the 640 / 480 aspect ratio you'd expect, given the image dimensions and square pixels.
At any rate, a horizontal angle of t is basically saying that the horizontal extent of the image, from left edge to right edge, covers an arc of t radians of the visual field, so the pixel at the center of the right edge lies along a ray rotated t / 2 radians to the right from the central view ray. This "righthand" ray runs from the eye at the origin through the point (tan(t / 2), 0, -1) (assuming a right-handed space with positive x pointing right and positive y pointing up, looking down the negative z axis). To get the point in space at distance d from the eye, you can just normalize a vector along this ray and multiply by it by d. Assuming the samples are linearly distributed across a flat sensor, I'd expect that for a given pixel at (x, y) you could calculate its corresponding ray point with:
p = (dx * tan(hfov / 2), dy * tan(vfov / 2), -1)
where dx is 2 * (x - width / 2) / width, dy is 2 * (y - height / 2) / height, and hfov and vfov are the field-of-view angles in radians.
Note that the documentation that accompanies your sample data links to a Matlab file that shows the recommended process for converting the depth images into a point cloud and distance field. In it, the fields of view are baked with the image dimensions to a constant factor of 570.3, which can be used to recover the field of view angles that the authors believed their recording device had:
atan(320 / 570.3) * (360 / pi / 2) * 2 = 58.6
which is indeed pretty close to the 62 degrees you were given.
From the Matlab code, it looks like the value in the image is not distance from a given point to the eye, but instead distance along the view vector to a perpendicular plane containing the given point ("depth", or basically "z"), so the authors can just multiply it directly with the vector (dx * tan(hfov / 2), dy * tan(vfov / 2), -1) to get the point in space, skipping the normalization step mentioned earlier.

How rectify an image from a single calibrated camera using Matlab toolbox [duplicate]

I'm using Matlab for camera calibration using Jean-
Yves Bouget's Camera Calibration Toolbox. I have all the camera
parameters from the calibration procedure. When I use a new image not
in the calibration set, I can get its transformation equation e.g.
Xc=R*X+T, where X is the 3D point of the calibration rig (planar) in
the world frame, and Xc its coordinates in the camera frame. In other
words, I have everything (both extrinsic and intrinsic parameters).
What I want to do is to perform perspective correction on this image
i.e. I want it to remove any perspective and see the calibration rig
undistorted (its a checkerboard).
Matlab's new Computer Vision toolbox has an object that performs a perspective transformation on an
image, given a 3X3 matrix H. The problem is, I can't compute this
matrix from the known intrinsic and extrinsic parameters!
To all who are still interested in this after so many months, i've managed to get the correct homography matrix using Kovesi's code (http://www.csse.uwa.edu.au/~pk/research/matlabfns), and especially the homography2d.m function. You will need however the pixel values of the four corners of the rig. If the camera is steady fixed, then you will need to do this once. See example code below:
%get corner pixel coords from base image
p1=[33;150;1];
p2=[316;136;1];
p3=[274;22;1];
p4=[63;34;1];
por=[p1 p2 p3 p4];
por=[0 1 0;1 0 0;0 0 1]*por; %swap x-y <--------------------
%calculate target image coordinates in world frame
% rig is 9x7 (X,Y) with 27.5mm box edges
XXw=[[0;0;0] [0;27.5*9;0] [27.5*7;27.5*9;0] [27.5*7;0;0]];
Rtarget=[0 1 0;1 0 0;0 0 -1]; %Rotation matrix of target camera (vertical pose)
XXc=Rtarget*XXw+Tc_ext*ones(1,4); %go from world frame to camera frame
xn=XXc./[XXc(3,:);XXc(3,:);XXc(3,:)]; %calculate normalized coords
xpp=KK*xn; %calculate target pixel coords
% get homography matrix from original to target image
HH=homography2d(por,xpp);
%do perspective transformation to validate homography
pnew=HH*por./[HH(3,:)*por;HH(3,:)*por;HH(3,:)*por];
That should do the trick. Note that Matlab defines the x axis in an image ans the rows index and y as the columns. Thus one must swap x-y in the equations (as you'll probably see in the code above). Furthermore, i had managed to compute the homography matrix from the parameters solely, but the result was slightly off (maybe roundoff errors in the calibration toolbox). The best way to do this is the above.
If you want to use just the camera parameters (that is, don't use Kovesi's code), then the Homography matrix is H=KK*Rmat*inv_KK. In this case the code is,
% corner coords in pixels
p1=[33;150;1];
p2=[316;136;1];
p3=[274;22;1];
p4=[63;34;1];
pmat=[p1 p2 p3 p4];
pmat=[0 1 0;1 0 0;0 0 1]*pmat; %swap x-y
R=[0 1 0;1 0 0;0 0 1]; %rotation matrix of final camera pose
Rmat=Rc_ext'*R; %rotation from original pose to final pose
H=KK*Rmat*inv_KK; %homography matrix
pnew=H*pmat./[H(3,:)*pmat;H(3,:)*pmat;H(3,:)*pmat]; %do perspective transformation
H2=[0 1 0;-1 0 0;0 0 1]*H; %swap x-y in the homography matrix to apply in image
Approach 1:
In the Camera Calibration Toolbox you should notice that there is an H matrix for each image of your checkerboard in your workspace. I am not familiar with the computer vision toolbox yet but perhaps this is the matrix you need for your function. It seems that H is computed like so:
KK = [fc(1) fc(1)*alpha_c cc(1);0 fc(2) cc(2); 0 0 1];
H = KK * [R(:,1) R(:,2) Tc]; % where R is your extrinsic rotation matrix and Tc the translation matrix
H = H / H(3,3);
Approach 2:
If the computer vision toolbox function doesn't work out for you then to find the prospective projection of an image I have used the interp2 function like so:
[X, Y] = meshgrid(0:size(I,2)-1, 0:size(I,1)-1);
im_coord = [X(:), Y(:), ones(prod(size(I_1)))]';
% Insert projection here for X and Y to XI and YI
ZI = interp2(X,Y,Z,XI,YI);
I have used prospective projections on a project a while ago and I believe that you need to use homogeneous coordinates. I think I found this wikipedia article quite helpful.

How do you judge the (real world) distance of an object in a picture?

I am building a recognition program in C++ and to make it more robust, I need to be able to find the distance of an object in an image.
Say I have an image that was taken 22.3 inches away of an 8.5 x 11 picture. The system correctly identifies that picture in a box with the dimensions 319 pixels by 409 pixels.
What is an effective way for relating the actual Height and width (AH and AW) and the pixel Height and width (PH and PW) to the distance (D)?
I am assuming that when I actually go to use the equation, PH and PW will be inversely proportional to D and AH and AW are constants (as the recognized object will always be an object where the user can indicate width and height).
I don't know if you changed your question at some point but my first answer it quite complicated for what you want. You probably can do something simpler.
1) Long and complicated solution (more general problems)
First you need the know the size of the object.
You can to look at computer vision algorithms. If you know the object (its dimensions and shape). Your main problem is the problem of pose estimation (that is find the position of the object relative the camera) from this you can find the distance. You can look at [1] [2] (for example, you can find other articles on it if you are interested) or search for POSIT, SoftPOSIT. You can formulate the problem as an optimization problem : find the pose in order to minimize the "difference" between the real image and the expected image (the projection of the object given the estimated pose). This difference is usually the sum of the (squared) distances between each image point Ni and the projection P(Mi) of the corresponding object (3D) point Mi for the current parameters.
From this you can extract the distance.
For this you need to calibrate you camera (roughly, find the relation between the pixel position and the viewing angle).
Now you may not want do code all of this for by yourself, you can use Computer Vision libs such as OpenCV, Gandalf [3] ...
Now you may want to do something more simple (and approximate). If you can find the image distance between two points at the same "depth" (Z) from the camera, you can relate the image distance d to the real distance D with : d = a D/Z (where a is a parameter of the camera related to the focal length, number of pixels that you can find using camera calibration)
2) Short solution (for you simple problem)
But here is the (simple, short) answer : if you picture in on a plane parallel to the "camera plane" (i.e. it is perfectly facing the camera) you can use :
PH = a AH / Z
PW = a AW / Z
where Z is the depth of the plane of the picture and a in an intrinsic parameter of the camera.
For reference the pinhole camera model relates image coordinated m=(u,v) to world coordinated M=(X,Y,Z) with :
m ~ K M
[u] [ au as u0 ] [X]
[v] ~ [ av v0 ] [Y]
[1] [ 1 ] [Z]
[u] = [ au as ] X/Z + u0
[v] [ av ] Y/Z + v0
where "~" means "proportional to" and K is the matrix of intrinsic parameters of the camera. You need to do camera calibration to find the K parameters. Here I assumed au=av=a and as=0.
You can recover the Z parameter from any of those equations (or take the average for both). Note that the Z parameter is not the distance from the object (which varies on the different points of the object) but the depth of the object (the distance between the camera plane and the object plane). but I guess that is what you want anyway.
[1] Linear N-Point Camera Pose Determination, Long Quan and Zhongdan Lan
[2] A Complete Linear 4-Point Algorithm for Camera Pose Determination, Lihong Zhi and Jianliang Tang
[3] http://gandalf-library.sourceforge.net/
If you know the size of the real-world object and the angle of view of the camera then assuming you know the horizontal angle of view alpha(*), the horizontal resolution of the image is xres, then the distance dw to an object in the middle of the image that is xp pixels wide in the image, and xw meters wide in the real world can be derived as follows (how is your trigonometry?):
# Distance in "pixel space" relates to dinstance in the real word
# (we take half of xres, xw and xp because we use the half angle of view):
(xp/2)/dp = (xw/2)/dw
dw = ((xw/2)/(xp/2))*dp = (xw/xp)*dp (1)
# we know xp and xw, we're looking for dw, so we need to calculate dp:
# we can do this because we know xres and alpha
# (remember, tangent = oposite/adjacent):
tan(alpha) = (xres/2)/dp
dp = (xres/2)/tan(alpha) (2)
# combine (1) and (2):
dw = ((xw/xp)*(xres/2))/tan(alpha)
# pretty print:
dw = (xw*xres)/(xp*2*tan(alpha))
(*) alpha = The angle between the camera axis and a line going through the leftmost point on the middle row of the image that is just visible.
Link to your variables:
dw = D, xw = AW, xp = PW
This may not be a complete answer but may push you in the right direction. Ever seen how NASA does it on those pictures from space? The way they have those tiny crosses all over the images. Thats how they get a fair idea about the deapth and size of the object as far as I know. The solution might be to have an object that you know the correct size and deapth of in the picture and then calculate the others' relative to that. Time for you to do some research. If thats the way NASA does it then it should be worth checking out.
I have got to say This is one of the most interesting questions i have seen for a long time on stackoverflow :D. I just noticed you have only two tags attached to this question. Adding something more in relation to images might help you better.