Getting 3D world coordinate from (x,y) pixel coordinates - c++

I'm absolutely new to ROS/Gazebo world; this is probably a simple question, but I cannot find a good answer.
I have a simulated depth camera (Kinect) in a Gazebo scene. After some elaborations, I get a point of interest in the RGB image in pixel coordinates, and I want to retrieve its 3D coordinates in the world frame.
I can't understand how to do that.
I have tried compensating the distortions, given by the CameraInfo msg. I have tried using PointCloud with pcl library, retrieving the point as cloud.at(x,y).
In both cases, the coordinates are not correct (I have put a small sphere in the coords given out by the program, so to check if it's correct or no).
Every help would be very appreciated. Thank you very much.
EDIT:
Starting from the PointCloud, I try to find the coords of the points doing something like:
point = cloud.at(xInPixel, yInPixel);
point.x = point.x + cameraPos.x;
point.y = point.y + cameraPos.y;
point.z = point.z - cameraPos.z;
but the x,y,z coords I get as point.x seems not to be correct.
The camera has a pitch angle of pi/2, so to points on the ground.
I am clearly missing something.

I assume you've seen the gazebo examples for the kinect (brief, full). You can get, as topics, the raw image, raw depth, and calculated pointcloud (by setting them in the config):
<imageTopicName>/camera/color/image_raw</imageTopicName>
<cameraInfoTopicName>/camera/color/camera_info</cameraInfoTopicName>
<depthImageTopicName>/camera/depth/image_raw</depthImageTopicName>
<depthImageCameraInfoTopicName>/camera/dept/camera_info</depthImageCameraInfoTopicName>
<pointCloudTopicName>/camera/depth/points</pointCloudTopicName>
Unless you need to do your own things with the image_raw for rgb and depth frames (ex ML over rgb frame & find corresponding depth point via camera_infos), the pointcloud topic should be sufficient - it's the same as the pcl pointcloud in c++, if you include the right headers.
Edit (in response):
There's a magical thing in ros called tf/tf2. Your pointcloud, if you look at the msg.header.frame_id, says something like "camera", indicating it's in the camera frame. tf, like the messaging system in ros, happens in the background, but it looks/listens for transformations from one frame of reference to another frame. You can then transform/query the data in another frame. For example, if the camera is mounted at a rotation to your robot, you can specify a static transformation in your launch file. It seems like you're trying to do the transformation yourself, but you can make tf do it for you; this allows you to easily figure out where points are in the world/map frame, vs in the robot/base_link frame, or in the actuator/camera/etc frame.
I would also look at these ros wiki questions which demo a few different ways to do this, depending on what you want: ex1, ex2, ex3

Related

Project point from point cloud to Image in OpenCV

I'm trying to project a point from 3D to 2D in OpenCV with C++. At the Moment, I'm using cv::projectPoints() but it's just not working out.
But first things first. I'm trying to write a program, that finds an intersection between a point cloud and a line in space. So I calibrated two cameras, did rectification and matching using SGBM. Finally I projected the disparity map to 3d using reprojectTo3D(). That all works very well and in meshlab, I can visualize my point cloud.
After that I wrote an algorithm to find the intersection between the point cloud and a line which I coded manually. That works fine, too. I found a point in the point cloud about 1.5 mm away from the line, which is good enough for the beginning. So I took this point and tried to project it back to the image, so I could mark it. But here is the problem.
Now the point is not inside the image anymore. As I took an intersection in the middle of the image, this is not possible. I think the problem could be in the coordinate systems, as I don't know in which coordinate system the point cloud is written (left camera, right camera or something else).
My projectPoints function looks like:
projectPoints(intersectionPoint3D, R, T, cameraMatrixLeft, distortionCoeffsLeft, intersectionPoint2D, noArray(), 0);
R and T are the rotation and translation from one camera to another (got that from stereoCalibrate). Here might be my mistake, but how can I fix it? I also tried to set these to (0,0,0) but it doesn't work either. Also I tried to transform the R Matrix using Rodrigues to a vector. Still same problem.
I'm sorry if this has been asked before, but I'm not sure how to search for this problem. I hope my text is clear enought to help me... if you need more information, I will gladly provide it.
Many thanks in advance.
You have a 3D point and you want to get the corresponding 2D location of it right? If you have the camera calibration matrix (3x3 matrix), you will be able to project the point to the image
cv::Point2d get2DFrom3D(cv::Point3d p, cv::Mat1d CameraMat)
{
cv::Point2d pix;
pix.x = (p.x * CameraMat(0, 0)) / p.z + CameraMat(0, 2);
pix.y = ((p.y * CameraMat(1, 1)) / p.z + CameraMat(1, 2));
return pix;
}

(Kinect v2) Alternative Kinect Fusion Pipeline (texture mapping)

I am trying a different Pipeline without success until now.
The Idea is to use the classic pipeline (as in the Explorer Example) but additionally to use the last ColorImage for the texutre.
So the idea (after clicking SAVE MESH):
Save current Image as BMP
Get the current transformation [m_pVolume->GetCurrentWorldToCameraTransform(&m_worldToCameraTransform);] .. lets call it M
Transform all Mesh vertices v in the last Camera Space Coordinate System ( M * v )
Now the current m_pMapper refers to the latest Frame which we want to use [ m_pMapper->MapCameraPointToColorSpace(camPoint, &colorPoint); ]
In theory I should have now every Point of the fusion mesh as a texture coordinate.. I want to use them to export as OBJ File (with texture and not only color).
What am I doing wrong?
The 3D Transformations seem to be correct.. when I visualize the resulting OBJ file in MeshLab I can see that the transformation is correct.. the WorldCoordinateSystem is Equal to the latest recorded position.
Only the texture is not set correctly.
I would be very very very very happy if anyone could help me. I am trying already for a long time :/
Thank you very much :)

Kinect as Motion Sensor

I'm planning on creating an app that does something like this: http://www.zonetrigger.com/articles/Kinect-software/
That means, I want to be able to set up "Trigger Zones" using the Kinect and it's 3d Image. Now I know that Microsoft is stating that the Kinect can detect the skeleton of up to 6 People.
For me however, it would be enough to detect whether something is entering a trigger zone and where.
Does anyone know if the Kinect can be programmed to function as a simple Motion Sensor, so it can detect more than 6 entries?
It is well known that Kinect cannot detect more than 5 entries (just kidding). All you need to do is to get a depth map (z-map) from Kinect, and then convert it into a 3d map using these formulas,
X = (((cols - cap_width) * Z ) / focal_length_X);
Y = (((row - cap_height)* Z ) / focal_length_Y);
Z = Z;
Where row and col are calculated from the image center (not upper left corner!) and focal is a focal length of Kinect in pixels (~570). Now you can specify the exact locations in 3D where if the pixels appear, you can do whatever you want to do. Here are more pointers:
You can use openCV for the ease of visualization. To read a frame from Kinect after it was initialized you just need something like this:
Mat inputMat = Mat(h, w, CV_16U, (void*) depth_gen.GetData());
You can easily visualize depth maps using histogram equalization (it will optimally spread 10000 Kinect levels among your available 255 levels of grey)
It is sometimes desirable to do object segmentation grouping spatially close pixels with similar depth together. I did this several years ago, see this but had to delete the floor and/or common surface on which object stayed otherwise all the object were connected and extracted as a single large segment.

Cairo flips a drawing

I want to draw a triangle and text using C++ and Cairo like this:
|\
| \
|PP\
|___\
If I add the triangle and the text using Cairo I get:
___
| /
|PP/
| /
|/
So the y-axis is from top to bottom, but I want it from bottom to top. So I tried to changed the viewpoint matrix (cairo_transform(p, &mat);) or scale the data (cairo_scale(p, 1.0, -1.0);). I get:
|\
| \
|bb\
|___\
Now the triangle is the way I want it BUT the TEXT is MIRRORED, which I do not want to be mirrored.
Any idea how to handle this problem?
I was in a similar situation as the OP that required me to change a variety of coordinates in the cartesian coordinate system with the origin at the bottom left. (I had to port an old video game that was developed with a coordinate system different from Cairo's, and because of time constraints/possible calculation mistakes/port precision I decided it was better to not rewrite the whole bunch) Luckily, I found an okay approach to change Cairo's coordinate system. The approach is based around Cairo's internal transformation matrix, that transforms Cairo's input to the user device. The solution was to change this matrix to a reflection matrix, a matrix that mirrors it's input through the x-axis, like so:
cairo_t *cr;
cairo_matrix_t x_reflection_matrix;
cairo_matrix_init_identity(&x_reflection_matrix); // could not find a oneliner
/* reflection through the x axis equals the identity matrix with the bottom
left value negated */
x_reflection_matrix.yy = -1.0;
cairo_set_matrix(cr, &x_reflection_matrix);
// This would result in your drawing being done on top of the destination
// surface, so we translate the surface down the full height
cairo_translate(cr, 0, SURFACE_HEIGHT); // replace SURFACE_HEIGHT
// ... do your drawing
There is one catch however: text will also get mirrored. To solve this, one could alter the font transformation matrix. The required code for this would be:
cairo_matrix_t font_reflection_matrix;
// We first set the size, and then change it to a reflection matrix
cairo_set_font_size(cr, YOUR_SIZE);
cairo_get_font_matrix(cr, &font_reflection_matrix);
// reverse mirror the font drawing matrix
font_reflection_matrix.yy = font_reflection_matrix.yy * -1;
cairo_set_font_matrix(cr, &font_reflection_matrix);
Answer:
Rethink your coordinates and pass them correctly to cairo. If your coordinates source has an inverted axis, preprocess them to flip the geometry. That would be called glue code, and it is ofter neccessary.
Stuff:
It is a very common thing with 2D computer graphics to have the origin (0,0) in the top left corner and the y-axis heading downwards (see gimp/photoshop, positioning in html, webgl canvas). As allways there are other examples too (PDFs).
I'm not sure what the reason is, but I would assume the reading direction on paper (from top to bottom) and/or the process of rendering/drawing an image on a screen.
To me, it seems to be the easiest way to procedurally draw an image at some position from the first to the last pixel (you don't need to precalculate it's size).
I don't think that you are alone with your oppinion. But I don't think that there is a standard math coordinate system. Even the very common carthesian coordinate system is incomplete when the arrows that indicate axis direction are missing.
Summary: From the discussion I assume that there is only one coordinate system used by Cairo: x-axis to the right, y-axis down. If one needs a standard math coordinate system (x-axis to the right, y-axis up) one has to preprocess the data.

Reverse Fish-Eye Distortion

I am working with a fish-eye camera and need the reverse the distortion before any further calculation,
In this question this is happening Correcting fisheye distortion
src = cv.LoadImage(src)
dst = cv.CreateImage(cv.GetSize(src), src.depth, src.nChannels)
mapx = cv.CreateImage(cv.GetSize(src), cv.IPL_DEPTH_32F, 1)
mapy = cv.CreateImage(cv.GetSize(src), cv.IPL_DEPTH_32F, 1)
cv.InitUndistortMap(intrinsics, dist_coeffs, mapx, mapy)
cv.Remap(src, dst, mapx, mapy, cv.CV_INTER_LINEAR + cv.CV_WARP_FILL_OUTLIERS, cv.ScalarAll(0))
The problem with this is that this way the remap functions goes through all the points and creates a new picture out of. this is time consuming to do it every frame.
They way that I am looking for is to have a point to point translation on the fish-eye picture to normal picture coordinates.
The approach we are taking is to do all the calculations on the input frame and just translate the result coordinates to the world coordinates so we don't want to go through all the points of a picture and create a new one out of it. (Time is really important for us)
In the matrices mapx and mapy there are some point to point translations but a lot of points are without complete translation.
I tried to interpolate this matrices but the result was not what I was looking for.
Any help in would be much appreciated, even other approaches which are more time efficient than cv.Remap.
Thanks
I think what you want is cv.UndistortPoints().
Assuming you have detected some point features distorted in your distorted image, you should be able to do something like this:
cv.UndistortPoints(distorted, undistorted, intrinsics, dist_coeffs)
This will allow you to work with undistorted points without generating a new, undistorted image for each frame.