Convert kinect depth intensity to distance in meter - computer-vision

I'm working on kinect v1 depth images.
How do I find the distance in meters for the corresponding depth intensity value of each pixel?
The intensity value ranges from 0-255 since it is a grayscale image and i don't have the raw depth data.
I've tried various ways to get the distance, such as using the following formulas:
- 1.0 / (raw_depth * -0.0030711016 + 3.3309495161)
- 0.1236 * tan(rawDisparity / 2842.5 + 1.1863)
I've also tried to get the raw data by using:
raw = (255 - depthIntensity)/256*2047
How do i solve this problem?

The Kinect actually sends something akin to a disparity image over USB. Both OpenNI and libfreenect are capable of converting that to a depth image using parameters reported by the device (baseline, focal length, and distance to reference plane, IIRC). e.g CV_CAP_PROP_OPENNI_BASELINE
In math form below, it is how we find the depth based on disparity
Depth = Baseline * focal length / disparity. The depth should be corresponding to the Z axis of current image frame.

Related

How to create point cloud from rgb & depth images?

For a university project I am currently working on, I have to create a point cloud by reading images from this dataset. These are basically video frames, and for each frame there is an rgb image along with a corresponding depth image.
I am familiar with the equation z = f*b/d, however I am unable to figure out how the data should be interpreted. Information about the camera that was used to take the video is not provided, and also the project states the following:
"Consider a horizontal/vertical field of view of the camera 48.6/62
degrees respectively"
I have little to no experience in computer vision, and I have never encountered 2 fields of view being used before. Assuming I use the depth from the image as is (for the z coordinate), how would I go about calculating the x and y coordinates of each point in the point cloud?
Here's an example of what the dataset looks like:
Yes, it's unusual to specify multiple fields of view. Given a typical camera (squarish pixels, minimal distortion, view vector through the image center), usually only one field-of-view angle is given -- horizontal or vertical -- because the other can then be derived from the image aspect ratio.
Specifying a horizontal angle of 48.6 and a vertical angle of 62 is particularly surprising here, since the image is a landscape view, where I'd expect the horizontal angle to be greater than the vertical. I'm pretty sure it's a typo:
When swapped, the ratio tan(62 * pi / 360) / tan(48.6 * pi / 360) is the 640 / 480 aspect ratio you'd expect, given the image dimensions and square pixels.
At any rate, a horizontal angle of t is basically saying that the horizontal extent of the image, from left edge to right edge, covers an arc of t radians of the visual field, so the pixel at the center of the right edge lies along a ray rotated t / 2 radians to the right from the central view ray. This "righthand" ray runs from the eye at the origin through the point (tan(t / 2), 0, -1) (assuming a right-handed space with positive x pointing right and positive y pointing up, looking down the negative z axis). To get the point in space at distance d from the eye, you can just normalize a vector along this ray and multiply by it by d. Assuming the samples are linearly distributed across a flat sensor, I'd expect that for a given pixel at (x, y) you could calculate its corresponding ray point with:
p = (dx * tan(hfov / 2), dy * tan(vfov / 2), -1)
where dx is 2 * (x - width / 2) / width, dy is 2 * (y - height / 2) / height, and hfov and vfov are the field-of-view angles in radians.
Note that the documentation that accompanies your sample data links to a Matlab file that shows the recommended process for converting the depth images into a point cloud and distance field. In it, the fields of view are baked with the image dimensions to a constant factor of 570.3, which can be used to recover the field of view angles that the authors believed their recording device had:
atan(320 / 570.3) * (360 / pi / 2) * 2 = 58.6
which is indeed pretty close to the 62 degrees you were given.
From the Matlab code, it looks like the value in the image is not distance from a given point to the eye, but instead distance along the view vector to a perpendicular plane containing the given point ("depth", or basically "z"), so the authors can just multiply it directly with the vector (dx * tan(hfov / 2), dy * tan(vfov / 2), -1) to get the point in space, skipping the normalization step mentioned earlier.

Performing threshold operation on an RGB image

I need to perform a threshold operation on an RGB image. The thresholding that I intend to do should behave as follows.
If greyscale equivalent of a pixel ( calculated as 0.299 * R' + 0.587 * G' + 0.114 * B' ) is Y, then the pixel value of the output image will be:
P = Threshold_color, if Y < threshold_value
= (R,G,B), Original value
,where Threshold_color is an RGB color value,
I wanted to perform this operation using Intel IPP library. There I found few API's related to thresholding of images. (ippiThreshold_LTVal_8u_C3R)
But these methods seems to work only on one data point at a time. But the thresholding that I want to do depends on the combination of 3 different values (R, G, B).
Is there a way to achieve this through IPP library?
Suggested approach:
Copy the image into a greyscale image
Create a binary mask 0/1 (same size as greyscale image) using the threshold
Multiply this mask with the replacement color you want to generate an overlay
Apply the overlay to the original image.
Note that you're generating images of different types here: first greyscale, then black&white, and finally color images again (although in step 3 it's a monochromatic image)
Yes you can implement this using IPP but I'm not aware of any standard function that does what you want.
All IPP threshold operations I can find in the reference use a global threshold.

Extracting 3D coordinates given 2D image points, depth map and camera calibration matrices

I have a set of 2D image keypoints that are outputted from the OpenCV FAST corner detection function. Using an Asus Xtion I also have a time-synchronised depth map with all camera calibration parameters known. Using this information I would like to extract a set of 3D coordinates (point cloud) in OpenCV.
Can anyone give me any pointers regarding how to do so? Thanks in advance!
Nicolas Burrus has created a great tutorial for Depth Sensors like Kinect.
http://nicolas.burrus.name/index.php/Research/KinectCalibration
I'll copy & paste the most important parts:
Mapping depth pixels with color pixels
The first step is to undistort rgb and depth images using the
estimated distortion coefficients. Then, using the depth camera
intrinsics, each pixel (x_d,y_d) of the depth camera can be projected
to metric 3D space using the following formula:
P3D.x = (x_d - cx_d) * depth(x_d,y_d) / fx_d
P3D.y = (y_d - cy_d) * depth(x_d,y_d) / fy_d
P3D.z = depth(x_d,y_d)
with fx_d, fy_d, cx_d and cy_d the intrinsics of the depth camera.
If you are further interested in stereo mapping (values for kinect):
We can then reproject each 3D point on the color image and get its
color:
P3D' = R.P3D + T
P2D_rgb.x = (P3D'.x * fx_rgb / P3D'.z) + cx_rgb
P2D_rgb.y = (P3D'.y * fy_rgb / P3D'.z) + cy_rgb
with R and T the rotation and translation parameters estimated during
the stereo calibration.
The parameters I could estimate for my Kinect are:
Color
fx_rgb 5.2921508098293293e+02
fy_rgb 5.2556393630057437e+02
cx_rgb 3.2894272028759258e+02
cy_rgb 2.6748068171871557e+02
k1_rgb 2.6451622333009589e-01
k2_rgb -8.3990749424620825e-01
p1_rgb -1.9922302173693159e-03
p2_rgb 1.4371995932897616e-03
k3_rgb 9.1192465078713847e-01
Depth
fx_d 5.9421434211923247e+02
fy_d 5.9104053696870778e+02
cx_d 3.3930780975300314e+02
cy_d 2.4273913761751615e+02
k1_d -2.6386489753128833e-01
k2_d 9.9966832163729757e-01
p1_d -7.6275862143610667e-04
p2_d 5.0350940090814270e-03
k3_d -1.3053628089976321e+00
Relative transform between the sensors (in meters)
R [ 9.9984628826577793e-01, 1.2635359098409581e-03, -1.7487233004436643e-02,
-1.4779096108364480e-03, 9.9992385683542895e-01, -1.2251380107679535e-02,
1.7470421412464927e-02, 1.2275341476520762e-02, 9.9977202419716948e-01 ]
T [ 1.9985242312092553e-02, -7.4423738761617583e-04, -1.0916736334336222e-02 ]

How to map thermal image (Flir A325sc) with Asus XTion Pro Live IR image under OpenCV+ROS

I want to map the thermal image of the Flir with the depth image of the XTion.
As the depth image is calculatet from Xtions IR camera I want to map the Flir with Xtions IR image.
Therefor I placed both cameras on one plane close to each other (about 7 cm in x, 1 cm in y and 3 cm in z).
Then I used ROS Indigo and openCV 2.4.9 to:
Set the Flir Focus to a fix on (no autofocus)
Get both images synchronized.
Resize the Xtion IR image from 640x480 to 320x240 pixels as the Flir image
Calculate the intrinsic camera parameters for both cameras. (Flir + Xtion IR)
Calculate the extrinsic parameters
Remap both images to get the rectified images
I now have the two rectified images but still an offset in X (horizontal direction).
If I understand that correctly, I have the offset due to the different focal lengths and field of views (Flir with objective: 45° H x 33.8° V and 9.66mm focal length, XTion: 58° H x 45° V) and could solve the problem with a perspective transform but I don't have both focal lengths in mm.
The datasheets:
http://support.flir.com/DsDownload/Assets/48001-0101_en_40.pdf
https://www.imc-store.com.au/v/vspfiles/assets/images/1196960_en_51.pdf
http://www.asus.com/us/Multimedia/Xtion_PRO_LIVE/specifications/
I had the idea to get the focal lengths with cv::calibrationMatrixValues but I dont know the apertureWith and Heigth.
Cross-Post
How could I solve this problem?

OpenCV: how to apply rainbow gradient map on an image?

Say we had an image we somehow modified via openCV:
And now we would love to apply to it Gradient Map (like one we can apply via photoshop):
So I wonder how to apply gradient map (rainbow colors) via openCV?
Here is a method to create false/pseudo-color images using Python, conversion to c++ should be very straightforward. Overview:
Open your image as grayscale, and RGB
Convert the RGB image to HSV (Hue, Saturation, Value/Brightness) color space. This is a cylindrical space, with hue represented by a single value on the polar axis.
Set the hue channel to the grayscale image we already opened, this is the crucial step.
Set value, and saturation channels both to maximal values.
Convert back to RGB space (otherwise display will be incorrect).
There are a couple of catches though...
As Hue is held in degrees and the color spectrum is represented from 0 to 180 (not 0-256 and not 0-360 (sometimes the case)), we need to rescale the grayscale image appropriately by multiplying by 180 / 256.0
In the opencv case the hue colorscale starts at blue (not red, as in your image). ie. the mapping goes like this:
from: to:
If this is important to change we can do so by offsetting all the hue elements and wrapping them around 180 (otherwise it will saturate). The code does this by masking the image at this cut off point and then offsetting appropriately. Using an offset of 120, generates your colorscale:
from: to:
and the image processed this way seems to match yours very well (at end).
import cv
image_bw = cv.LoadImage("TfBmw.jpg", cv.CV_LOAD_IMAGE_GRAYSCALE)
image_rgb = cv.LoadImage("TfBmw.jpg")
#create the image arrays we require for the processing
hue=cv.CreateImage((image_rgb.width,image_rgb.height), cv.IPL_DEPTH_8U, 1)
sat=cv.CreateImage((image_rgb.width,image_rgb.height), cv.IPL_DEPTH_8U, 1)
val=cv.CreateImage((image_rgb.width,image_rgb.height), cv.IPL_DEPTH_8U, 1)
mask_1=cv.CreateImage((image_rgb.width,image_rgb.height), cv.IPL_DEPTH_8U, 1)
mask_2=cv.CreateImage((image_rgb.width,image_rgb.height), cv.IPL_DEPTH_8U, 1)
#convert to cylindrical HSV color space
cv.CvtColor(image_rgb,image_rgb,cv.CV_RGB2HSV)
#split image into component channels
cv.Split(image_rgb,hue,sat,val,None)
#rescale image_bw to degrees
cv.ConvertScale(image_bw, image_bw, 180 / 256.0)
#set the hue channel to the greyscale image
cv.Copy(image_bw,hue)
#set sat and val to maximum
cv.Set(sat, 255)
cv.Set(val, 255)
#adjust the pseudo color scaling offset, 120 matches the image you displayed
offset=120
cv.CmpS(hue,180-offset, mask_1, cv.CV_CMP_GE)
cv.CmpS(hue,180-offset, mask_2, cv.CV_CMP_LT)
cv.AddS(hue,offset-180,hue,mask_1)
cv.AddS(hue,offset,hue,mask_2)
#merge the channels back
cv.Merge(hue,sat,val,None,image_rgb)
#convert back to RGB color space, for correct display
cv.CvtColor(image_rgb,image_rgb,cv.CV_HSV2RGB)
cv.ShowImage('image', image_rgb)
# cv.SaveImage('TfBmw_120.jpg',image_rgb)
cv.WaitKey(0)
Your image processed with offset = 120:
Now exists the openCV function called applyColorMap which makes this process trivial. The following code will do the trick
image_cm = cv2.applyColorMap(image, cv2.COLORMAP_JET))
And this is the result:
Figure 1: Original plane
Figure2: Plane after applying colormap