Depth map values in opencv's reprojectImageTo3D() - c++

OpenCV's reprojectImageTo3D() outputs a "3-channel image representing a 3D surface".
You can access this data by
Vec3f coordinates = _3dImage.at<Vec3f>(y,x);
float depth = _3dImage.at<Vec3f>(y,x)[2];
witch returns a vector [X,Y,Z].
In "Learning OpenCV" by Gary Bradski & Adrian Kaehler, it is explained that the depth is calculated by
Z = f T / (x_left - x_right)
where f = focal length, T = eye base/translation between cameras, (x_left - x_right) = disparity
This exact formula is implemented in OpenCV (I checked the source code - however there is for some reason an additional negative sign). The question is: In which unit are the X, Y, Z values specified?
T is in your unit (e.g. mm), x_l - x_r is in pixel and [ f ] = ?
When you calibrate the camera, you specify the chessboard's size in real world units (e.g. mm). Does the intrinsic matrix therefore have real world units? Or is it specified in px? Unfortunately I cannot find the answer in the documentation.

The underlying equation that performs depth reconstruction is:
Z = fB/d, where
f is the focal length (in pixels), you called it as eye base/translation between cameras
B is the stereo baseline (in meters)
d is disparity (in pixels) that measures the difference in retinal position between corresponding points
Z is the distance along the camera Z axis
The 3D position (X,Y,Z) of an image point (e.g. (u,v) in pixels) can be given in meters, cm, mm or whatever you choose, because the 3D coordinates (X,Y,Z) are in the same units as the chessboard's square size. For example, if you define the square size to be 1 cm then the 3D coordinates will be in cm as well.
i.e.:
Size boardSize(4, 5); // 4x5 chessboard
float squareSize = 0.025F; // 0.025 meters
for( int i = 0; i < boardSize.height; i++ )
for( int j = 0; j < boardSize.width; j++ )
corners.push_back(Point3f(float(j*squareSize), float(i*squareSize), 0.0F));
p.s.:
After Z is determined, X and Y can be calculated using the usual projective camera equations:
X = uZ/f
Y = vZ/f

Related

How to convert 2d(x,y) cooridinates into 3d(x,y,z) coordinates using python and point cloud?

I have been using this github repo: https://github.com/aim-uofa/AdelaiDepth/blob/main/LeReS/Minist_Test/tools/test_shape.py
To figure out how this piece of code can be used to get x,y,z coordinates:
def reconstruct_3D(depth, f):
"""
Reconstruct depth to 3D pointcloud with the provided focal length.
Return:
pcd: N X 3 array, point cloud
"""
cu = depth.shape[1] / 2
cv = depth.shape[0] / 2
width = depth.shape[1]
height = depth.shape[0]
row = np.arange(0, width, 1)
u = np.array([row for i in np.arange(height)])
col = np.arange(0, height, 1)
v = np.array([col for i in np.arange(width)])
v = v.transpose(1, 0)
I want to use these coordinates to find distance between 2 people in 3D for an object detection model. Does anyone have any advice?
I know how to use 2d images with yolo to figure out distance between 2 people. Based on this link: Compute the centroid of a rectangle in python
My thinking is i can use the bounding boxes to get corners and then find the centroid and do that for 2 bounding boxes of people and use triangulation to find the hypotenuse between the 2 points (which is their distance).
However, i am having a tricky time on how to use a set of 3d coordinates to find distance between 2 people. I can get the relative distance from my 2d model.
By having a 2D depth image and camera's intrinsic matrix, you can convert each pixel to 3D point cloud as:
z = d
x = (u - cx) * z / f
y = (v - cy) * z / f
// where (cx, cy) is the principle point and f is the focal length.
In the meantime, you can use third party library like open3d for doing the same:
xyz = open3d.geometry.create_point_cloud_from_depth_image(depth, intrinsic)

How can I iterate a coordinate sphere using an expanding spherical sector (cone)?

Given an integer 3D coordinate system, a center point P, a vector in some direction V, and a max sphere radius R:
I want to iterate over only integer points in a fashion that starts at P and goes along direction V until reaching the max radius R.
Then, for some small angle T iterate all points within the cone (or spherical sector) around V.
Incrementally expand T until T is pi/2 radians and every point within the sphere has been iterated.
I need to do this with O(1) space complexity. So the order of the points can't be precomputed/sorted but must result naturally from some math.
Example:
// Vector3 represents coordinates x, y, z
// where (typically) x is left/right, y is up/down, z is depth
Vector3 center = Vector3(0, 0, 0); // could be anything
Vector3 direction = Vector3(0, 100, 0); // could be anything
int radius = 4;
double piHalf = acos(0.0); // half of pi
std::queue<Vector3> list;
for (double angle = 0; angle < piHalf; angle+= .1)
{
int x = // confusion begins here
int y = // ..
int z = // ..
list.push(Vector3(x, y, z));
}
See picture for this example
The first coordinates that should be caught are:
A(0,0,0), C(0,1,0), D(0,2,0), E(0,3,0), B(0,4,0)
Then, expanding the angle somewhat (orange cone):
K(-1,0,3), X(1,0,3), (0,1,3), (0,-1,3)
Expanding the angle a bit more (green cone):
F(1,1,3), (-1,-1,3), (1,-1,3) (-1,1,3)
My guess for what would be next is:
L(1,0,2), (-1,0,2), (0,1,2), (0,-1,2)
M(2,0,3) would be hit somewhat after
Extra notes and observations:
A cone will hit a max of four points at its base, if the vector is perpendicular to an axis and originates at an integer point. It may also hit points along the cone wall depending on the angle
I am trying to do this in c++
I am aware of how to check whether a point X is within any given cone or spherical vector by comparing the angle between V and PX with T and am currently using this knowledge for a lesser solution.
This is not a homework question, I am working on a 3D video game~
iterate all integer positions Q in your sphere
simple 3x nested for loops through x,y,z in range <P-R,P+R> will do. Just check inside sphere so
u=(x,y,z)-P;
dot(u,u) <= R*R
test if point Q is exactly on V
simply by checking angle between PQ and V by dot product:
u = Q-P
u = u/|u|
v = V/|V|
if (dot(u,v)==1) point Q is on V
test if points is exactly on surface of "cone"
simply by checking angle between PQ and V by dot product:
u = Q-P
u = u/|u|
v = V/|V|
if (dot(u,v)==cos(T/2)) point Q is on "cone"
where I assume T is full "cone" angle not the half one.
Beware you need to use floats/double for this and make the comparison with some margin for error like:
if (fabs(dot(u,v)-1.0 )<1e-6) point Q is on V
if (fabs(dot(u,v)-cos(T/2))<1e-6) point Q is on "cone"

3D to 2D world object coordinate transformation

I have the following (simplified) code which represents a 3D pose of a robot estimated by visual odometry:
// (r11 r12 r13 tx)
// (r21 r22 r23 ty)
// (r31 r32 r33 tz)
// (0 0 0 1)
Eigen::Matrix4f pose_prev_to_cur;
Eigen::Matrix4f pose_cur_to_world;
while (1) {
pose_prev_to_cur = ... /* from visual odometry */
pose_cur_to_world = pose_cur_to_world * pose_prev_to_cur.inverse();
}
As you can see, the pose matrix contains a 3x3 3D rotation matrix and a 3x1 translation vector.
I am plotting the robot trajectory by creating a point on a map for every frame:
// x translation maps to 2D x axis
int x = WINDOW_SCALE * pose_cur_to_world(0, 3) + WINDOW_SIZE / 2;
// z translation maps to 2D y axis
int y = WINDOW_SCALE * pose_cur_to_world(2, 3) + WINDOW_SIZE / 2;
// plotting the point
cv::circle(traj, cv::Point(x, y), 1, cv::Scalar(0, 255, 0), 2);
This works well for me and the trajectory is plotted, as you can see in the image:
The trajectory is green, the current robot position is red.
Now I have depth readings for each frame too. For each image pixel obtained from the camera, I have a depth measurement in meters. I'd like to plot the distance in the map as well.
My current idea is to just 'add' the depth value to the z translation value of the pose matrix like so:
Eigen::Matrix4f pose_depth_reading = pose_cur_to_world;
// offset z location by depth reading in meters
pose_depth_reading(2, 3) += 2.0;
Now my question is: how to plot this new point in the map?
If I just plot the point using the x/z coordinates of the 3D points, it will obviously not be rotated properly, as it will just have an offset on the Y axis.
I thought about taking the calculated 2D map point and offsetting its y coordinate by the depth reading and then do 2D rotation about the origin instead. But my problem is I'd need to acquire the 2D rotation angle from the 3x3 3D rotation matrix in pose_cur_to_world and I'm not quite sure how to do that.
The end result I expect is shown in the following image:
The depth reading(s) are marked blue and I added a gray coordinate frame with X and Y axis for brevity.

How to determine the intersection between the camera direction and a plane?

I have a 3D scene with an infinite horizontal plane (parallel to the xz coordinates) at a height H along the Y vertical axis.
I would like to know how to determine the intersection between the axis of my camera and this plane.
The camera is defined by a view-matrix and a projection-matrix.
There are two sub-problems here: 1) Extracting the position and view-direction from the camera matrix. 2) Calculating the intersection between the view-ray and the plane.
Extracting position and view-direction
The view matrix describes how points are transformed from world-space to view space. The view-space in OpenGL is usually defined such that the camera is in the origin and looks into the -z direction.
To get the position of the camera, we have to transform the origin [0,0,0] of the view-space back into world-space. Mathematically speaking, we have to calculate:
camera_pos_ws = inverse(view_matrix) * [0,0,0,1]
but when looking at the equation we'll see that we are only interrested in the 4th column of the inverse matrix which will contain 1
camera_pos_ws = [-view_matrix[12], -view_matrix[13], -view_matrix[14]]
The orientation of the camera can be found by a similar calculation. We know that the camera looks in -z direction in view-space thus the world space direction is given by
camera_dir_ws = inverse(view_matrix) * [0,0,-1,0];
Again, when looking at the equation, we'll see that this only takes the third row of the inverse matrix into account which is given by2
camera_dir_ws = [-view_matrix[2], -view_matrix[6], -view_matrix[10]]
Calculating the intersection
We now know the camera position P and the view direction D, thus we have to find the x,z value along the ray R(x,y,z) = P + l * D where y equals H. Since there is only one unknown, l, we can calculate that from
y = Py + l * Dy
H = Py + l * Dy
l = (H - Py) / Dy
The intersection point is then given by pasting l back into the ray equation.
Notes
1 The indices assume that the matrix is stored in a column-major linear array.
2 Note, that the inverse of a matrix of the form
M = [ R T ]
0 1
, where R is a orthogonal 3x3 matrix, is given by
inv(M) = [ transpose(R) -T ]
0 1
For a general line-plane intersection there are lot of answers and tutorials.
Your case is simple due to the plane is horizontal.
I suppose the camera is at C(cx, cy, cz) and it looks at T(tx, ty,tz).
Then the line camera-target can be defined by:
cx - x cy - y cz - z
------ = ------ = ------ /// These are two independant equations
tx - cx ty - cy tz - cz
For a horizontal plane, only a equation is needed: y = H.
Substitute this value in the line equations and you get
(cx-x)/(tx-cx) = (cy-H)/(ty-cy)
(cz-z)/(tz-cz) = (cy-H)/(ty-cy)
So
x = cx - (tx-cx)*(cy-H)/(ty-cy)
y = H
z = cz - (tz-cz)*(cy-H)/(ty-cy)
Of course if your camera looks in an also horizontal line then ty=cy and there is not solution.

OpenCV Computing Camera Position && Rotation

for a project I need to compute the real world position and orientation of a camera
with respect to a known object.
I have a set of photos, each displays a chessboard from different points of view.
Using CalibrateCamera and solvePnP I am able to reproject Points in 2d, to get a AR-thing.
So my situation is as such:
Intrinsic parameters are known
Distortioncoefficients are known
translation Vector and rotation Vector are known per photo.
I simply cannot figure out how to compute the position of the camera. My guess was:
invert translation vector. (=t')
transform rotation vector to degree (seems to be radian) and invert
use rodriguez on rotation vector
compute RotationMatrix * t'
But the results are somehow totally off...
Basically I want to to compute a ray for each pixel in world coordinates.
If more informations on my problem are needed, I'd be glad to answer quickly.
I dont' get it... somehow the rays are still off. This is my Code btw:
Mat image1CamPos = tvecs[0].clone(); //From calibrateCamera
Mat rot = rvecs[0].clone(); //From calibrateCamera
Rodrigues(rot, rot);
rot = rot.t();
//Position of Camera
Mat pos = rot * image1CamPos;
//Ray-Normal (( (double)mk[i][k].x) are known image-points)
float x = (( (double)mk[i][0].x) / fx) - (cx / fx);
float y = (( (double)mk[i][0].y) / fy) - (cy / fy);
float z = 1;
float mag = sqrt(x*x + y*y + z*z);
x /= mag;
y /= mag;
z /= mag;
Mat unit(3, 1, CV_64F);
unit.at<double>(0, 0) = x;
unit.at<double>(1, 0) = y;
unit.at<double>(2, 0) = z;
//Rotation of Ray
Mat rot = stof1 * unit;
But when plotting this, the rays are off :/
The translation t (3x1 vector) and rotation R (3x3 matrix) of an object with respect to the camera equals the coordinate transformation from object into camera space, which is given by:
v' = R * v + t
The inversion of the rotation matrix is simply the transposed:
R^-1 = R^T
Knowing this, you can easily resolve the transformation (first eq.) to v:
v = R^T * v' - R^T * t
This is the transformation from camera into object space, i.e., the position of the camera with respect to the object (rotation = R^T and translation = -R^T * t).
You can simply get a 4x4 homogeneous transformation matrix from this:
T = ( R^T -R^T * t )
( 0 1 )
If you now have any point in camera coordinates, you can transform it into object coordiantes:
p' = T * (x, y, z, 1)^T
So, if you'd like to project a ray from a pixel with coordinates (a,b) (probably you will need to define the center of the image, i.e. the principal point as reported by CalibrateCamera, as (0,0)) -- let that pixel be P = (a,b)^T. Its 3D coordinates in camera space are then P_3D = (a,b,0)^T. Let's project a ray 100 pixel in positive z-direction, i.e. to the point Q_3D = (a,b,100)^T. All you need to do is transform both 3D coordinates into the object coordinate system using the transformation matrix T and you should be able to draw a line between both points in object space. However, make sure that you don't confuse units: CalibrateCamera will report pixel values while your object coordinate system might be defined in, e.g., cm or mm.