OpenCV get 3D coordinates from 2D pixel - c++

For my undergraduate paper I am working on a iPhone Application using openCV to detect domino tiles. The detection works well in close areas, but when the camera is angled the tiles far away are difficult to detect.
My approach to solve this I would want to do some spacial calculations. For this I would need to convert a 2D Pixel value into world coordinates, calculate a new 3D position with a vector and convert these coordinates back to 2D and then check the colour/shape at that position.
Additionally I would need to know the 3D positions for Augmented Reality additions.
The Camera Matrix i got trough this link create opencv camera matrix for iPhone 5 solvepnp
The Rotationmatrix of the Camera I get from the Core Motion.
Using Aruco markers would be my last resort, as I woulnd't get the decided effect that I would need for the paper.
Now my question is, can i not make calculations when I know the locations and distances of the circles on a lets say Tile with a 5 on it?
I wouldn't need to have a measurement in mm/inches, I can live with vectors without measurements.
The camera needs to be able to be rotated freely.
I tried to invert the calculation sm'=A[R|t]M' to be able to calculate the 2D coordinates in 3D. But I am stuck with inverting the [R|t] even on paper, and I don't know either how I'd do that in swift or c++.
I have read so many different posts on forums, in books etc. and I am completely stuck and appreciate any help/input you can give me. Otherwise I'm screwed.
Thank you so much for your help.
Update:
By using the solvePnP that was suggested by Micka I was able to get the Rotation and Translation Vectors for the angle of the camera.
Meaning that if you are able to identify multiple 2D Points in your image and know their respective 3D World coordinates (in mm, cm, inch, ...), then you can get the mechanisms to project points from known 3D World coordinates onto the respective 2D coordinates in your image. (use the opencv projectPoints function).
What is up next for me to solve is the translation from 2D into 3D coordinates, where I need to follow ozlsn's approach with the inverse of the received matrices out of solvePnP.
Update 2:
With a top down view I am getting along quite well to being able to detect the tiles and their position in the 3D world:
tile from top Down
However if I am now angling the view, my calculations are not working anymore. For example I check the bottom Edge of a 9-dot group and the center of the black division bar for 90° angles. If Corner1 -> Middle Edge -> Bar Center and Corner2 -> Middle Edge -> Bar Center are both 90° angles, than the bar in the middle is found and the position of the tile can be found.
When the view is Angled, then these angles will be shifted due to the perspective to lets say 130° and 50°. (I'll provide an image later).
The Idea I had now is to make a solvePNP of 4 Points (Bottom Edge plus Middle), claculate solvePNP and then rotate the needed dots and the center bar from 2d position to 3d position (height should be irrelevant?). Then i could check with the translated points if the angles are 90° and do also other needed distance calculations.
Here is an image of what I am trying to accomplish:
Markings for Problem
I first find the 9 dots and arrange them. For each Edge I try to find the black bar. As said above, seen from Top, the angle blue corner, green middle edge to yellow bar center is 90°.
However, as the camera is angled, the angle is not 90° anymore. I also cannot check if both angles are 180° together, that would give me false positives.
So I wanted to do the following steps:
Detect Center
Detect Edges (3 dots)
SolvePnP with those 4 points
rotate the edge and the center points (coordinates) to 3D positions
Measure the angles (check if both 90°)
Now I wonder how I can transform the 2D Coordinates of those points to 3D. I don't care about the distance, as I am just calculating those with reference to others (like 1.4 times distance Middle-Edge) etc., if I could measure the distance in mm, that would even be better though. Would give me better results.
With solvePnP I get the rvec which I could change into the rotation Matrix (with Rodrigues() I believe). To measure the angles, my understanding is that I don't need to apply the translation (tvec) from solvePnP.
This leads to my last question, when using the iPhone, can't I use the angles from the motion detection to build the rotation matrix beforehand and only use this to rotate the tile to show it from the top? I feel that this would save me a lot of CPU Time, when I don't have to solvePnP for each tile (there can be up to about 100 tile).
Find Homography
vector<Point2f> tileDots;
tileDots.push_back(corner1);
tileDots.push_back(edgeMiddle);
tileDots.push_back(corner2);
tileDots.push_back(middle.Dot->ellipse.center);
vector<Point2f> realLivePos;
realLivePos.push_back(Point2f(5.5,19.44));
realLivePos.push_back(Point2f(12.53,19.44));
realLivePos.push_back(Point2f(19.56,19.44));
realLivePos.push_back(Point2f(12.53,12.19));
Mat M = findHomography(tileDots, realLivePos, CV_RANSAC);
cout << "M = "<< endl << " " << M << endl << endl;
vector<Point2f> barPerspective;
barPerspective.push_back(corner1);
barPerspective.push_back(edgeMiddle);
barPerspective.push_back(corner2);
barPerspective.push_back(middle.Dot->ellipse.center);
barPerspective.push_back(possibleBar.center);
vector<Point2f> barTransformed;
if (countNonZero(M) < 1)
{
cout << "No Homography found" << endl;
} else {
perspectiveTransform(barPerspective, barTransformed, M);
}
This however gives me wrong values, and I don't know anymore where to look (Sehe den Wald vor lauter Bäumen nicht mehr).
Image Coordinates https://i.stack.imgur.com/c67EH.png
World Coordinates https://i.stack.imgur.com/Im6M8.png
Points to Transform https://i.stack.imgur.com/hHjBM.png
Transformed Points https://i.stack.imgur.com/P6lLS.png
You see I am even too stupid to post 4 images here??!!?
The 4th index item should be at x 2007 y 717.
I don't know what I am doing wrongly here.
Update 3:
I found the following post Computing x,y coordinate (3D) from image point which is doing exactly what I need. I don't know maybe there is a faster way to do it, but I am not able to find it otherwise. At the moment I can do the checks, but still need to do tests if the algorithm is now robust enough.
Result with SolvePnP to find bar Center

The matrix [R|t] is not square, so by-definition, you cannot invert it. However, this matrix lives in the projective space, which is nothing but an extension of R^n (Euclidean space) with a '1' added as the (n+1)st element. For compatibility issues, the matrices that multiplies with vectors of the projective space are appended by a '1' at their lower-right corner. That is : R becomes
[R|0]
[0|1]
In your case [R|t] becomes
[R|t]
[0|1]
and you can take its inverse which reads as
[R'|-Rt]
[0 | 1 ]
where ' is a transpose. The portion that you need is the top row.
Since the phone translates in the 3D space, you need the distance of the pixel in consideration. This means that the answer to your question about whether you need distances in mm/inches is a yes. The answer changes only if you can assume that the ratio of camera translation to the depth is very small and this is called weak perspective camera. The question that you're trying to tackle is not an easy one. There is still people researching on this at PhD degree.

Related

Need help understanding the Perspective-Three-Point

I'm following this explanation on the P3P problem and have a few questions.
In the heading labeled Section 1 they project the image plane points onto a unit sphere. I'm not sure why they do this, is this to simulate a camera lens? I know in OpenCV, we first compute the intrinsics of the camera and factor it into solvePnP. Is this unit sphere serving a similar purpose?
Also in Section 1, where did $u^{'}_x$, $u^{'}_y$, and $u^{'}_z$ come from, and what are they? If we are projecting onto a 2D plane then why do we need the third component? I know the standard answer is "because homogenous coordinates" but I can't seem to find an explanation as to why we use them or what they really are.
Also in Section 1 what does "normalize using L2 norm" mean, and what did this step accomplish?
I'm hoping if I understand Section 1, I can understand the notation in the following sections.
Thanks!
Here are some hints
The projection onto the unit sphere has nothing to do with the camera lens. It is just a mathematical transformation intended to simplify the P3P equation system (whose solutions we are trying to compute).
$u'_x$ and $u'_y$ are the coordinates of $(u,v) - P$ (here $P=(c_x, c_y)$), normalized by the focal distances $f_x$ and $f_y$. The subtraction of the camera optical center $P$ is a translation of the origin to this point. The introduction of the $z$ coordinate $u'_z=1$ moves the 2D point $(u'_x, u'_y)$ to the 3D plane defined by the equation $z=1$ (the 3D plane parallel to the $xy$ plane). Note that by moving points to the plane $z=1$, you now can better visualize of them as the intersections of 3D lines that pass thru $P$ and them. In other words, these points become the projections onto a 2D plane of 3D points located somewhere on those lines (well, not merely "somewhere" but at the focal distance, which has now been "normalized" to 1 after dividing by $f_x$ and $f_y$). Again, all transformations intended to solve the equations.
The so called $L2$ norm is nothing but the usual distance that comes from the Pithagoras Theorem ($a^2 + b^2 = c^2$), only that it's being used to measure distances between points in the 3D space.

How to find the angle formed by blades of a wind turbine when the yaw is changed?

This is a continuation of the question from Here-How to find angle formed by the blades of a wind turbine with respect to a horizontal imaginary axis?
I've decided to use the following methodology for this-
 Getting a frame from a camera and putting it in a loop.
 Performing Canny edge detection.
 Perform HoughLinesP to detect lines in the image.
Finding Blade Angle:
 Perform Probabilistic Hough Lines Transform on the image. Restrict the blade lines to the length of the blades, as known already.
 The returned value will have the start and end points of the lines detected. Since there are no background noises, this gives the starting and end point of the blade lines and the image will have the blade lines.
 Now, find the dot product with a vector (1,0) by finding the vectors of the blade lines detected or we can use atan2 to find the relative angle of all the points detected with respect to a horizontal.
Problem:
When the yaw angle of the turbine is changed and it is not directly facing the camera, how do I calculate the blade angle formed?
The idea is to basically map the angles when rotated back into the form when viewed head on. From what I've been able to understand, I thought I'd find the homography matrix, decompose the matrix to get rotation, convert to Euler angles to calculate shift from the original axis, then shift all the axes with that angle. However, it's just a vague idea with no concrete planning to go upon.
Or I begin with trying to find the projection matrix, then get camera matrix and rotation matrix? I am lost on this account completely and feel overwhelmed with the many functions...
Other things I came across was the perspective transform,solvepnp..
It would be great if anyone could suggest another way to deal with this? Any links of code snippets would be helpful. I'm not that familiar with OpenCV and would be grateful for any help.
Thanks!
Edit:
[Edit by Spektre]
Assume the tip of the blades plus the center (or the three "roots" of the blades") lie on a common plane.
Fit a homography between those points and the corresponding ones in a reference pose for the turbine (cv::findHomography in OpenCv)
Decompose the homography into rotation and translation using an estimated or assumed camera calibration (cv::decomposeHomographyMat).
Convert the rotation into Euler angles.

Mathematical Issue: Triangle, Pyramid, Rotation, Translation, Zoom

Another tricky question. What you can see here is my physical pyramid built with 3 leds which form a triangle in 1 plane and another led in the mid center, about 18mm above the other 3. The 4th one makes the triangle to a pyramid. (You may can see it better if you look on the right triangle. This one is rotated about the horizontal achsis, and you can see a diode on a stick very well).
The second picture shows my running program. The left box shows the raw picture of the leds (photo with ir-filter). The picture in the center shows that my program found the points and is also able to tell which point is which, based on some conditions (like C is always where the both lines with maximal distance betweens diodes intersect; and the both longest lengths are always a and b). But dont care about this, i know the points are 100% correctly found.
Then on the right picture are some calculated values, like the height between C and c and so on. I would be able to calculate more, but i didnt bother to care for now, cause I am stuck.
I want to calculate the pyramids rotation and translation in the 3 dimensional space.
The yellow points are the leds after rotation arround an axis throught the center of the triangle in camera z- direction. So now i do not have to worry about this, when calculating the other 2. The Rotation arround the horizontal axis, and the rotation arround the vertical axis. I could easily calculate this with the lengths of the distance from the center of the triangle to the 4th diode (as you can see the 4th diode moves on the image plane with rotation), or the lengths of the both axes.
But my problem is the unknown depth.
It affects all lengths (a,b,c, and also the lengths from the center to the 4th diode if we call this d and e). I know the measurments of the real pyramid, with a tolerance of +-5% or so, but they get also affected by the zoom. So how do i deal with this?
I thought of an equation with a ratio between something with the lengths of the horizontal axis, the length of the vertical axis, the angles alpha, beta and gamma, and the lengths d and e.
Alpha, beta and gamma only get affected by rotation arround the axes (which i want to know. i want to know the rotation and the zoom), where a rotation arround one axis has the opposite effect than a rotation arround the other. So if you rotate arround both axes in the same angle, the ratio between the length of the axes is the same as before.
The zoom (real: how close it is to the camera; what i want to know in 1st place: multiplication factor 2x, 3x,0.5, 0,4322344,.....) does not affect the angles, but all the lengths: a,b,c,d,e,hc (vertical length of axis), hx (i have not calculated it yet, but it would be easy. the name hx can vary, i just thought of something random right now; it is the length of the horizontal axis) in the same way (i guess).
You see i have thought of many, but i think i am too dumb.
So, is there any math genius out there wo can give me the right equations, for either the rotation OR/AND the zoomfactor?
(i also thought about using Posit/Downhill- Simplex, and so on, but this would be the nicest, since i already know so much, like all Points, and so on and so on)
Please, please, i need your help really bad! I am writing this in C++ and with help of OpenCV if you need to know, but i think its more a mathematical problem.
Thanks in advance!
Ah, and Alpha seems to be always the same as Beta!
Edit: Had to delete the second picture
Have a look to Boost Geometry or here also
Have a look at SolvePnP() in OpenCV. Even if you don't use it directly, the documentation has citations for the methods used.

use homography to rotate around x/y axes

The Project
I am working on a texture tracking project for mobile. It exclusively tracks planar surfaces so I have been using openCV's cv::FindHomography() to calculate the homography between two frames. That function runs very very slow however and is the primary bottleneck in my pipeline. I decided that an algorithm that can take an initial estimate of the homography would run much faster because my change in homography between frames is very small. Also, my outlier percentage is very small so robust methods are optional. Unfortunately, to my knowledge open CV does not include a homography finder that takes an initial estimate. It does however include solvePnP() which takes the original 3d world coordinates of the scene, the current 2d image coordinates, a camera matrix, distortion parameters, and most importantly an initial estimate. I am trying to replace FindHomography with solvePnP. Since I use only 2d coordinates throughout the pipeline and solvePnP asks for 3d coordinates I am trying to move from 2d->3d->3d_transform->2d_transform. Right now that process runs 6x faster than FindHomography() if it is given a good initial guess but it has issues.
The Problem
Something is wrong with how I am converting. My theory was that since a camera matrix is not required to find a homography it should not be required for this process since I only want the information contained in a homography in the end. I also assumed that since I throw out all z information in the end how I initialize z should not matter. My process is as follows
First I convert all my initial 2d coordinates to 3d by giving them a z pos of 1. I can assume that my original coordinates lie flat in the x-y plane. Then
cv::Mat rot_mat; //3x3 rotation matrix
cv::Mat pnp_rot; //3x1 rotation vector
cv::Mat pnp_tran; //3x1 translation vector
cv::Matx33f camera_matrix(1,0,0,
0,1,0,
0,0,1);
cv::Matx41f dist(0,0,0,0);
cv::solvePnP(original_cord, current_cord, camera_matrix, dist, pnp_rot, pnp_tran,true);
//Rodrigues converts from a rotation vector to a rotation matrix
cv::Rodrigues(pnp_rot, rot_mat);
cv::Matx33f homography(rot_mat(0,0),rot_mat(0,1),pnp_tran(0),
rot_mat(1,0),rot_mat(1,1),pnp_tran(1),
rot_mat(2,0),rot_mat(2,1),pnp_tran(2)+1);
The conversion to a homography here is simple. The first two columns of the homography are from the 3x3 rotation matrix, the last column is the translation vector. The one trick is that homography(2,2) corresponds to scale while pnp_tran(2) corresponds to movement in the z axis. Given that I initialize my z coordinates to 1, scale is z_translation + 1. This process works perfectly for 4 of 6 degrees of freedom. Translation_x, translation_x, scale, and rotation about z all work. Rotation about x and y however display significant error. I believe that this is due to initializing my points at z = 1 but I don't know how to fix it.
The Question
Was my assumption that I can get good results from solvePnP by using a faked camera matrix and initial z coord correct? If so, how should I set up my camera matrix and z coordinates to make x and y rotation work? Also if anyone knows where I could get a homography finding algorithm that takes an initial guess and works only in 2d, or information on techniques for writing my own it would be very helpful. I will most likely be moving in that direction once I get this working.
Update
I built myself a test program which takes a homography, generates a set of coplanar points from that homography, and then runs the points through solvePnP to recover the specified homography. In the process of doing this I realized that I am fundamentally misunderstanding some part of how homographies are constructed. I have been assuming that a homography is constructed as follows.
hom(0,2) = x translation
hom(1,2) = y translation
hom(2,2) = scale, I can divide the entire matrix by this to normalize
the first two columns I assumed were the first two columns of a 3x3 rotation matrix. This essentially amounts to taking a 3x4 transform and throwing away column(2). I have discovered however that this is not true. The test case showing me the error of my ways was trying to make a homography which rotates points some small angle around the y axis.
//rotate by .0175 rad about y axis
rot_mat = (1,0,.0174,
0,1,0,
-.0174,0,1);
//my conversion method to make this a homography gives
homography = (1,0,0,
0,1,0,
-.0174,0,1);
The above homography does not work at all. Take for example a point x,y,1 where x > 58. The result will be x,y,some_negative_number. When I convert from homogeneous coordinates back to cartesian my x and y values will both flip signs.
All that is to say, I now have a much simpler question that I think would let me solve everything. How do I construct a homography that rotates points by some angle around the x and y axis?
Homographies are not simple translation or rotation matrices. The aim is to map straight lines to straight lines rather than to map single points to other points. They take into account perspective matrices to achieve this and are explained here
Hence, homography matrices cannot be easily decomposed, but there are (complicated) ways to do so shown here. This may help you extract rotations and translations out of it.
This should help you better understand a homography, but the rest I am unfamiliar with.

Coordinate Transformation C++

I have a webcam pointed at a table at a slant and with it I track markers.
I have a transformationMatrix in OpenSceneGraph and its translation part contains the relative coordinates from the tracked Object to the Camera.
Because the Camera is pointed at a slant, when I move the marker across the table the Y and Z axis is updated, although all I want to be updated is the Z axis, because the height of the marker doesnt change only its distance to the camera.
This has the effect when when project a model on the marker in OpenSceneGraph, the model is slightly off and when I move the marker arround the Y and Z values are updated incorrectly.
So my guess is I need a Transformation Matrix with which I multiply each point so that I have a new coordinate System which lies orthogonal on the table surface.
Something like this: A * v1 = v2 v1 being the camera Coordinates and v2 being my "table Coordinates"
So what I did now was chose 4 points to "calibrate" my system. So I placed the marker at the top left corner of the Screen and defined v1 as the current camera coordinates and v2 as (0,0,0) and I did that for 4 different points.
And then taking the linear equations I get from having an unknown Matrix and two known vectors I solved the matrix.
I thought the values I would get for the matrix would be the values I needed to multiply the camera Coordinates with so the model would updated correctly on the marker.
But when I multiply the known Camera Coordinates I gathered before with the matrix I didnt get anything close to what my "table coordinates" were suposed to be.
Is my aproach completely wrong, did I just mess something up in the equations? (solved with the help of wolframalpha.com) Is there an easier or better way of doing this?
Any help would be greatly appreciated, as I am kind of lost and under some time pressure :-/
Thanks,
David
when I move the marker across the table the Y and Z axis is updated, although all I want to be updated is the Z axis, because the height of the marker doesnt change only its distance to the camera.
Only true when your camera's view direction is aligned with your Y axis (or Z axis). If the camera is not aligned with Y, it means the transform will apply a rotation around the X axis, hence modifying both the Y and Z coordinates of the marker.
So my guess is I need a Transformation Matrix with which I multiply each point so that I have a new coordinate System which lies orthogonal on the table surface.
Yes it is. After that, you will have 2 transforms:
T_table to express marker's coordinates in the table referential,
T_camera to express table coordinates in the camera referential.
Finding T_camera from a single 2d image is hard because there's no depth information.
This is known as the Pose problem -- it has been studied by -among others- Daniel DeMenthon. He developed a fast and robust algorithm to find the pose of an object:
articles available on its research homepage, section 4 "Model Based Object Pose" (and particularly "Model-Based Object Pose in 25 Lines of Code", 1995);
code at the same place, section "POSIT (C and Matlab)".
Note that the OpenCv library offers an implementation of the DeMenthon's algorithm. This library also offers a convenient and easy-to-use interface to grab images from a webcam. It's worth a try: OpenCv homepage
If you know the location in the physical world of your four markers and you've recorded the positions as they appear on the camera, you ought to be able to derive some sort of transform.
When you do the calibration, surely you'd want to put the marker at the four corners of the table not the screen? If you're just doing the corners of the screen, I imagine you're probably not taking into acconut the slant of the table.
Is the table literally just slanted relative to the camera or is it also rotated at all?