Align depth image to RGB image - computer-vision

I am trying to generate a point cloud using images captured by Kinect with Python and libfreenect, but I couldn't align the depth data to RGB data taken by Kinect.
I applied Nicolas Burrus's equation but the two images turned further away, is there something wrong with my code:
cx_d = 3.3930780975300314e+02
cy_d = 2.4273913761751615e+02
fx_d = 5.9421434211923247e+02
fy_d = 5.9104053696870778e+02
fx_rgb = 5.2921508098293293e+02
fy_rgb = 5.2556393630057437e+02
cx_rgb = 3.2894272028759258e+02
cy_rgb = 2.6748068171871557e+02
RR = np.array([
[0.999985794494467, -0.003429138557773, 0.00408066391266],
[0.003420377768765,0.999991835033557, 0.002151948451469],
[-0.004088009930192, -0.002137960469802, 0.999989358593300 ]
])
TT = np.array([ 1.9985242312092553e-02, -7.4423738761617583e-04,-1.0916736334336222e-02 ])
# uu, vv are indices in depth image
def depth_to_xyz_and_rgb(uu , vv):
# get z value in meters
pcz = depthLookUp[depths[vv , uu]]
# compute x,y values in meters
pcx = (uu - cx_d) * pcz / fx_d
pcy = (vv - cy_d) * pcz / fy_d
# apply extrinsic calibration
P3D = np.array( [pcx , pcy , pcz] )
P3Dp = np.dot(RR , P3D) - TT
# rgb indexes that P3D should match
uup = P3Dp[0] * fx_rgb / P3Dp[2] + cx_rgb
vvp = P3Dp[1] * fy_rgb / P3Dp[2] + cy_rgb
# return a point in point cloud and its corresponding color indices
return P3D , uup , vvp
Is there anything I did wrong? Any help is appreciated

First, check your calibration numbers. Your rotation matrix is approximately the identity and, assuming your calibration frame is metric, your translation vector says that the second camera is 2 centimeters to the side and one centimeter displaced in depth. Does that approximately match your setup? If not, you may be working with the wrong scaling (likely using a wrong number for the characteristic size of your calibration target - a checkerboard?).
Your code looks correct - you are re-projecting a pixel of the depth camera at a known depth, and the projecting it back in the second camera to get at the corresponding rgb value.
One think I would check is whether your using your coordinate transform in the right direction. IIRC, OpenCV produces it as [R | t], but you are using it as [R | -t], which looks suspicious. Perhaps you meant to use its inverse, which would be [R' | -R'*t ], where I use the apostrophe to mean transposition.

Related

How to convert 2d(x,y) cooridinates into 3d(x,y,z) coordinates using python and point cloud?

I have been using this github repo: https://github.com/aim-uofa/AdelaiDepth/blob/main/LeReS/Minist_Test/tools/test_shape.py
To figure out how this piece of code can be used to get x,y,z coordinates:
def reconstruct_3D(depth, f):
"""
Reconstruct depth to 3D pointcloud with the provided focal length.
Return:
pcd: N X 3 array, point cloud
"""
cu = depth.shape[1] / 2
cv = depth.shape[0] / 2
width = depth.shape[1]
height = depth.shape[0]
row = np.arange(0, width, 1)
u = np.array([row for i in np.arange(height)])
col = np.arange(0, height, 1)
v = np.array([col for i in np.arange(width)])
v = v.transpose(1, 0)
I want to use these coordinates to find distance between 2 people in 3D for an object detection model. Does anyone have any advice?
I know how to use 2d images with yolo to figure out distance between 2 people. Based on this link: Compute the centroid of a rectangle in python
My thinking is i can use the bounding boxes to get corners and then find the centroid and do that for 2 bounding boxes of people and use triangulation to find the hypotenuse between the 2 points (which is their distance).
However, i am having a tricky time on how to use a set of 3d coordinates to find distance between 2 people. I can get the relative distance from my 2d model.
By having a 2D depth image and camera's intrinsic matrix, you can convert each pixel to 3D point cloud as:
z = d
x = (u - cx) * z / f
y = (v - cy) * z / f
// where (cx, cy) is the principle point and f is the focal length.
In the meantime, you can use third party library like open3d for doing the same:
xyz = open3d.geometry.create_point_cloud_from_depth_image(depth, intrinsic)

kitti dataset camera projection matrix

I am looking at the kitti dataset and particularly how to convert a world point into the image coordinates. I looked at the README and it says below that I need to transform to camera coordinates first then multiply by the projection matrix. I have 2 questions, coming from a non computer vision background
I looked at the numbers from calib.txt and in particular the matrix is 3x4 with non-zero values in the last column. I always thought this matrix = K[I|0], where K is the camera's intrinsic matrix. So, why is the last column non-zero and what does it mean? e.g P2 is
array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
After applying projection into [u,v,w] and dividing u,v by w, are these values with respect to origin at the center of image or origin being at the top left of the image?
README:
calib.txt: Calibration data for the cameras: P0/P1 are the 3x4
projection
matrices after rectification. Here P0 denotes the left and P1 denotes the
right camera. Tr transforms a point from velodyne coordinates into the
left rectified camera coordinate system. In order to map a point X from the
velodyne scanner to a point x in the i'th image plane, you thus have to
transform it like:
x = Pi * Tr * X
Refs:
How to understand the KITTI camera calibration files?
Format of parameters in KITTI's calibration file
http://www.cvlibs.net/publications/Geiger2013IJRR.pdf
Answer:
I strongly recommend you read those references above. They may solve most, if not all, of your questions.
For question 2: The projected points on images are with respect to origin at the top left. See ref 2 & 3, the coordinates of a far 3d point in image are (center_x, center_y), whose values are provided in the P_rect matrices. Or you can verify this with some simple codes:
import numpy as np
p = np.array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
x = [0, 0, 1E8, 1] # A far 3D point
y = np.dot(p, x)
y[0] /= y[2]
y[1] /= y[2]
y = y[:2]
print(y)
You will see some output like:
array([6.018873e+02, 1.831104e+02 ])
which is quite near the (p[0, 2], p[1, 2]), a.k.a. (center_x, center_y).
For all the P matrices (3x4), they represent:
P(i)rect = [[fu 0 cx -fu*bx],
[0 fv cy -fv*by],
[0 0 1 0]]
Last column are baselines in meters w.r.t. the reference camera 0. You can see the P0 has all zeros in the last column because it is the reference camera.
This post has more details:
How Kitti calibration matrix was calculated?

Aruco markers with openCv, get the 3d corner coordinates?

I am detecting a printed Aruco marker using opencv 3.2:
aruco::estimatePoseSingleMarkers(corners, markerLength, camMatrix, distCoeffs, rvecs,tvecs);
this returns a translation and rotation vector for the marker. What I need, is the 3d coordinates for each corner of the marker.
As i know the marker length, i could do something like
corner1 = tvecs[0] - markerlength /2;
corner2 = tvecs[0] + markerlength /2;
....
But is there an better way? Or an existing function?
To sum up, I have:
a 3d point in the center of a 2d square.
the length of the sides of that square.
the rotation value of the square.
How can I find the 3d coordinates of the corners?
First, let's assume that we only have one marker given with side = 2 * half_side.
Second, aruco::detectMarker returns the relative position of the camera in the marker's world. Thus, I assume that you are looking for the coordinates of the corners in camera's world.
Then, in marker's space:
[ half_side ] [ 0 ]
E = [ 0 ], F = [ half_side ]
[ 0 ] [ 0 ]
where the center O of the square has coordinate tvec (in camera's world) and rotation mat of the marker rot_mat is computed by cv::Rodrigues(rvec,rot_mat).
Now, using the pinhole camera model, the relation between coordinates of a point P in cam's world and marker's world is:
[P_x_cam] [P_x_marker]
[P_y_cam] = rot_mat * [P_y_marker] + tvec
[P_z_cam] [P_z_marker]
for example, the center O, which is [0,0,0] in marker's world, is tvec in cam's world.
So, the coordinates of E in cam's world are:
[E_x_cam] [half_side]
|E_y_cam| = rot_mat * | 0 | + tvec
[E_z_cam] [ 0 ]
Magically, it is the sum of rot_mat's first column multiplied by half_size and tvec. Similarly,
the coodinates of F is rot_mat's second column multiplied by half_size and tvec.
Now, the corners can be computed, for example
C - O = (E - O) + (F - O), B - O = (E - O) - (F - O)
where E-O is exactly rot_mat's first column multiplied by half_size.
With all that in mind, we can compose the function:
vector<Point3f> getCornersInCameraWorld(double side, Vec3d rvec, Vec3d tvec){
double half_side = side/2;
// compute rot_mat
Mat rot_mat;
Rodrigues(rvec, rot_mat);
// transpose of rot_mat for easy columns extraction
Mat rot_mat_t = rot_mat.t();
// the two E-O and F-O vectors
double * tmp = rot_mat_t.ptr<double>(0);
Point3f camWorldE(tmp[0]*half_side,
tmp[1]*half_side,
tmp[2]*half_side);
tmp = rot_mat_t.ptr<double>(1);
Point3f camWorldF(tmp[0]*half_side,
tmp[1]*half_side,
tmp[2]*half_side);
// convert tvec to point
Point3f tvec_3f(tvec[0], tvec[1], tvec[2]);
// return vector:
vector<Point3f> ret(4,tvec_3f);
ret[0] += camWorldE + camWorldF;
ret[1] += -camWorldE + camWorldF;
ret[2] += -camWorldE - camWorldF;
ret[3] += camWorldE - camWorldF;
return ret;
}
Note 1: I hate that SO doesn't have MathJax
Note 2: there must be some faster implementation which I don't know of.
A python implementation I wrote for the above described rotation of marker corners using rvec and tvec as returned from cv2.aruco.estimatePoseSingleMarkers(). Thanks #Quang Hoang for the detailed explanation.
import numpy as np
# rotate a markers corners by rvec and translate by tvec if given
# input is the size of a marker.
# In the markerworld the 4 markercorners are at (x,y) = (+- markersize/2, +- markersize/2)
# returns the rotated and translated corners and the rotation matrix
def rotate_marker_corners(rvec, markersize, tvec = None):
mhalf = markersize / 2.0
# convert rot vector to rot matrix both do: markerworld -> cam-world
mrv, jacobian = cv2.Rodrigues(rvec)
#in markerworld the corners are all in the xy-plane so z is zero at first
X = mhalf * mrv[:,0] #rotate the x = mhalf
Y = mhalf * mrv[:,1] #rotate the y = mhalf
minusX = X * (-1)
minusY = Y * (-1)
# calculate 4 corners of the marker in camworld. corners are enumerated clockwise
markercorners = []
markercorners.append(np.add(minusX, Y)) #was upper left in markerworld
markercorners.append(np.add(X, Y)) #was upper right in markerworld
markercorners.append(np.add( X, minusY)) #was lower right in markerworld
markercorners.append(np.add(minusX, minusY)) #was lower left in markerworld
# if tvec given, move all by tvec
if tvec is not None:
C = tvec #center of marker in camworld
for i, mc in enumerate(markercorners):
makercorners[i] = np.add(C,mc) #add tvec to each corner
#print('Vec X, Y, C, dot(X,Y)', X,Y,C, np.dot(X,Y)) # just for debug
markercorners = np.array(markercorners,dtype=np.float32) # type needed when used as input to cv2
return markercorners, mrv
'''
Copyright 2019 Marco Noll, Garmin International Inc. Licensed under the Apache
License, Version 2.0 (the "License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the specific
language governing permissions and limitations under the License.
'''
Building upon #Quang's answer, C# code for transforming any point to camera coordinates. Of course it needs R and t vectors, so you're going to need a marker in order to get them.
private Point3d GetWorldPoint(Point3d input, Vec3d rvec, Vec3d tvec)
{
var rot_mat = new Mat();
Cv2.Rodrigues(MatOfDouble.FromArray(rvec.Item0, rvec.Item1, rvec.Item2), rot_mat);
var pointProject = (rot_mat * MatOfDouble.FromArray(input.X, input.Y, input.Z)).ToMat();
return tvec + new Point3d(pointProject.Get<double>(0, 0), pointProject.Get<double>(0, 1), pointProject.Get<double>(0, 2));
}

how to calculate field of view of the camera from camera intrinsic matrix?

I got camera intrinsic matrix and distortion parameters using camera calibration.
The unit of the focal length is pixels, i guess.
Then, how can i calculate field of view (along y) ?
Is this formula right?
double fov_y = 2*atan(height/2/fy)*180/CV_PI;
I'll use it to parameters of
gluPerspective()
OpenCV has a function that does this. Looking at the implementation (available on GitHub) we have given an image with dimensions w x h and a camera matrix:
the equations for the field of view are:
In continuation of #mallwright's answer, here is a bit of Python/numpy code to compute the field of view from the image resolution and focal lengths (in pixels):
import numpy as np
# Prepare
w, h = 1280, 720
fx, fy = 1027.3, 1026.9
# Go
fov_x = np.rad2deg(2 * np.arctan2(w, 2 * fx))
fov_y = np.rad2deg(2 * np.arctan2(h, 2 * fy))
print("Field of View (degrees):")
print(f" {fov_x = :.1f}\N{DEGREE SIGN}")
print(f" {fov_y = :.1f}\N{DEGREE SIGN}")
Output:
Field of View (degrees):
fov_x = 63.8°
fov_y = 38.6°
Note that this assumes that the principal point is at the center of the image and that there is no distortion, see this answer.

Python cv2 Image Pyramids

Trying to implement the famous Orange/Apple pyramids blending (cv2 Image Pyramids).
Note: Both images shape is 307x307.
However, since the result image is blurred due to clipping values in cv2.subtract and cv2.add (as stated in cv2 vs numpy Matrix Arithmetics), I have used numpy arithmetics instead as suggested in StackOverflow: Reconstructed Image after Laplacian Pyramid Not the same as original image.
I have tested this by performing pyramids on one image and the result image constructed back using pyramids has the same Max,Min,Average pixels values as opposed to using cv2 arithmetics.
However, on pyramids level 7, the result image gets a 'noise' of a red dot and on level 9 the result image gets a lot of green pixels noises. Images of levels 6, 7, 9 - Imgur Album.
Any ideas why would this happen? The pyramid level 9 green noise I would say happened because the image went below 1x1 shape. But what about the red dot on 7 level pyramid?
EDIT : Code Added
numberOfPyramids = 9
# generate Gaussian pyramids for A and B Images
GA = A.copy()
GB = B.copy()
gpA = [GA]
gpB = [GB]
for i in xrange(numberOfPyramids):
GA = cv2.pyrDown(GA)
GB = cv2.pyrDown(GB)
gpA.append(GA)
gpB.append(GB)
# generate Laplacian Pyramids for A and B Images
lpA = [gpA[numberOfPyramids - 1]]
lpB = [gpB[numberOfPyramids - 1]]
for i in xrange(numberOfPyramids - 1, 0, -1):
geA = cv2.pyrUp(gpA[i], dstsize = np.shape(gpA[i-1])[:2])
geB = cv2.pyrUp(gpB[i], dstsize = np.shape(gpB[i-1])[:2])
laplacianA = gpA[i - 1] - geA if i != 1 else cv2.subtract(gpA[i-1], geA)
laplacianB = gpB[i - 1] - geB if i != 1 else cv2.subtract(gpB[i-1], geB)
lpA.append(laplacianA)
lpB.append(laplacianB)
# Now add left and right halves of images in each level
LS = []
for la, lb in zip(lpA, lpB):
_, cols, _ = la.shape
ls = np.hstack((la[:, : cols / 2], lb[:, cols / 2 :]))
LS.append(ls)
# now reconstruct
ls_ = LS[0]
for i in xrange(1, numberOfPyramids):
ls_ = cv2.pyrUp(ls_, dstsize = np.shape(LS[i])[:2])
ls_ = ls_ + LS[i] if i != numberOfPyramids - 1 else cv2.add(ls_, LS[i])
cv2.imshow(namedWindowName, ls_)
cv2.waitKey()
After read the original article about laplacian pyramid, I find I misunderstood this method, we can fully reconstruct the original image without blur, because we use of additional pix information. And It is true that clipping value lead to blurred. Well now we come back to the beginning again :)
So the code you post is still clipping value, I advise you use int16 to save the laplacian pyramid, and not use cv2.subtract. Hope it works.