Need help understanding the Perspective-Three-Point - computer-vision

I'm following this explanation on the P3P problem and have a few questions.
In the heading labeled Section 1 they project the image plane points onto a unit sphere. I'm not sure why they do this, is this to simulate a camera lens? I know in OpenCV, we first compute the intrinsics of the camera and factor it into solvePnP. Is this unit sphere serving a similar purpose?
Also in Section 1, where did $u^{'}_x$, $u^{'}_y$, and $u^{'}_z$ come from, and what are they? If we are projecting onto a 2D plane then why do we need the third component? I know the standard answer is "because homogenous coordinates" but I can't seem to find an explanation as to why we use them or what they really are.
Also in Section 1 what does "normalize using L2 norm" mean, and what did this step accomplish?
I'm hoping if I understand Section 1, I can understand the notation in the following sections.
Thanks!

Here are some hints
The projection onto the unit sphere has nothing to do with the camera lens. It is just a mathematical transformation intended to simplify the P3P equation system (whose solutions we are trying to compute).
$u'_x$ and $u'_y$ are the coordinates of $(u,v) - P$ (here $P=(c_x, c_y)$), normalized by the focal distances $f_x$ and $f_y$. The subtraction of the camera optical center $P$ is a translation of the origin to this point. The introduction of the $z$ coordinate $u'_z=1$ moves the 2D point $(u'_x, u'_y)$ to the 3D plane defined by the equation $z=1$ (the 3D plane parallel to the $xy$ plane). Note that by moving points to the plane $z=1$, you now can better visualize of them as the intersections of 3D lines that pass thru $P$ and them. In other words, these points become the projections onto a 2D plane of 3D points located somewhere on those lines (well, not merely "somewhere" but at the focal distance, which has now been "normalized" to 1 after dividing by $f_x$ and $f_y$). Again, all transformations intended to solve the equations.
The so called $L2$ norm is nothing but the usual distance that comes from the Pithagoras Theorem ($a^2 + b^2 = c^2$), only that it's being used to measure distances between points in the 3D space.

Related

Calibrate camera using cube

Is it possible to calibrate the camera using a cube with a length of 1cm on each side? It is obvious that we need to find 6 point correspondence taking into consideration that the shouldn't be on same plane and on same line. The first part can be easily handled but my problem is that how we can ensure that none of the points or on the same line?
Rather vague question. My answers:
Yes, it is possible to calibrate a pinhole model by matching the 3D location of the visible vertices of a cube in general position to their projections in one image. Four points on one face define a homography which can be decomposed using Zhang's method to recover the focal length. The extra visible points can be used to tighten the estimate.
Whether 1cm side length is adequate depends entirely on the actual lens used and the cube's distance from it. Accuracy will degrade the as the portion of the image covered by the cube is reduced.
Calibrating the nonlinear lens distortion will be almost impossible. The only information the cube provides is the orthogonality and length of the sides, i.e. few data points. Unless you take lots of images you won't have enough effective constraints.
Your cube had better be machined very accurately, if you hope for any precision.
A dodecahedron makes for a much better rig, if you really want to use a 3D one.

Duplicate points along NURBS curve

in my current project I have implemented NURBS curves and at the beginning of the curve I have some 3D points, which are all located in the normal plane of the point (u = 0.0). Now I want to copy these points to other locations of the curve (e.g. u = 0.5) to create some kind of extrude / sweep mechanism. My theoretical approach is to create a local coordinate system in point 0.0 and to calculate the coordinates of every point in relation to this system. Then I can create local coordinate systems at the desired points and place the points there. My problem is that with the first derivation of the NURBS curve I can get the tangent and therefore the normal plane of the point / system (local X direction) but I don't know how to orient the system. My first idea was to take the second derivative of the NURBS curve and use this to calculate the local Y and Z axis of the system but the results of the second derivatives does not seem to be suitable for this approach.
Is there a common approach to solve this problem?
As an additional question I am wondering how to dictate the tangent vector of a given control point, for example the tangent of the first control point. Currently I solve this by dictating the position of the second control point, which seems to be not very elegant.
We solved the same problem using this approach:
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/Computation-of-rotation-minimizing-frames.pdf
Look like you would like to find a local coordinate system at any given point on the NURBS curve. If this is the case, Frenet frame is the typical choice. See this link for more details.
As for the issue of "tangent vector of a given control point", since control points in general do not lie on the NURBS curve, it does not have a tangent vector. If you really need one for some special reason, you can use the tangent vector at the point on the curve that is closest to the control point.

OpenCV get 3D coordinates from 2D pixel

For my undergraduate paper I am working on a iPhone Application using openCV to detect domino tiles. The detection works well in close areas, but when the camera is angled the tiles far away are difficult to detect.
My approach to solve this I would want to do some spacial calculations. For this I would need to convert a 2D Pixel value into world coordinates, calculate a new 3D position with a vector and convert these coordinates back to 2D and then check the colour/shape at that position.
Additionally I would need to know the 3D positions for Augmented Reality additions.
The Camera Matrix i got trough this link create opencv camera matrix for iPhone 5 solvepnp
The Rotationmatrix of the Camera I get from the Core Motion.
Using Aruco markers would be my last resort, as I woulnd't get the decided effect that I would need for the paper.
Now my question is, can i not make calculations when I know the locations and distances of the circles on a lets say Tile with a 5 on it?
I wouldn't need to have a measurement in mm/inches, I can live with vectors without measurements.
The camera needs to be able to be rotated freely.
I tried to invert the calculation sm'=A[R|t]M' to be able to calculate the 2D coordinates in 3D. But I am stuck with inverting the [R|t] even on paper, and I don't know either how I'd do that in swift or c++.
I have read so many different posts on forums, in books etc. and I am completely stuck and appreciate any help/input you can give me. Otherwise I'm screwed.
Thank you so much for your help.
Update:
By using the solvePnP that was suggested by Micka I was able to get the Rotation and Translation Vectors for the angle of the camera.
Meaning that if you are able to identify multiple 2D Points in your image and know their respective 3D World coordinates (in mm, cm, inch, ...), then you can get the mechanisms to project points from known 3D World coordinates onto the respective 2D coordinates in your image. (use the opencv projectPoints function).
What is up next for me to solve is the translation from 2D into 3D coordinates, where I need to follow ozlsn's approach with the inverse of the received matrices out of solvePnP.
Update 2:
With a top down view I am getting along quite well to being able to detect the tiles and their position in the 3D world:
tile from top Down
However if I am now angling the view, my calculations are not working anymore. For example I check the bottom Edge of a 9-dot group and the center of the black division bar for 90° angles. If Corner1 -> Middle Edge -> Bar Center and Corner2 -> Middle Edge -> Bar Center are both 90° angles, than the bar in the middle is found and the position of the tile can be found.
When the view is Angled, then these angles will be shifted due to the perspective to lets say 130° and 50°. (I'll provide an image later).
The Idea I had now is to make a solvePNP of 4 Points (Bottom Edge plus Middle), claculate solvePNP and then rotate the needed dots and the center bar from 2d position to 3d position (height should be irrelevant?). Then i could check with the translated points if the angles are 90° and do also other needed distance calculations.
Here is an image of what I am trying to accomplish:
Markings for Problem
I first find the 9 dots and arrange them. For each Edge I try to find the black bar. As said above, seen from Top, the angle blue corner, green middle edge to yellow bar center is 90°.
However, as the camera is angled, the angle is not 90° anymore. I also cannot check if both angles are 180° together, that would give me false positives.
So I wanted to do the following steps:
Detect Center
Detect Edges (3 dots)
SolvePnP with those 4 points
rotate the edge and the center points (coordinates) to 3D positions
Measure the angles (check if both 90°)
Now I wonder how I can transform the 2D Coordinates of those points to 3D. I don't care about the distance, as I am just calculating those with reference to others (like 1.4 times distance Middle-Edge) etc., if I could measure the distance in mm, that would even be better though. Would give me better results.
With solvePnP I get the rvec which I could change into the rotation Matrix (with Rodrigues() I believe). To measure the angles, my understanding is that I don't need to apply the translation (tvec) from solvePnP.
This leads to my last question, when using the iPhone, can't I use the angles from the motion detection to build the rotation matrix beforehand and only use this to rotate the tile to show it from the top? I feel that this would save me a lot of CPU Time, when I don't have to solvePnP for each tile (there can be up to about 100 tile).
Find Homography
vector<Point2f> tileDots;
tileDots.push_back(corner1);
tileDots.push_back(edgeMiddle);
tileDots.push_back(corner2);
tileDots.push_back(middle.Dot->ellipse.center);
vector<Point2f> realLivePos;
realLivePos.push_back(Point2f(5.5,19.44));
realLivePos.push_back(Point2f(12.53,19.44));
realLivePos.push_back(Point2f(19.56,19.44));
realLivePos.push_back(Point2f(12.53,12.19));
Mat M = findHomography(tileDots, realLivePos, CV_RANSAC);
cout << "M = "<< endl << " " << M << endl << endl;
vector<Point2f> barPerspective;
barPerspective.push_back(corner1);
barPerspective.push_back(edgeMiddle);
barPerspective.push_back(corner2);
barPerspective.push_back(middle.Dot->ellipse.center);
barPerspective.push_back(possibleBar.center);
vector<Point2f> barTransformed;
if (countNonZero(M) < 1)
{
cout << "No Homography found" << endl;
} else {
perspectiveTransform(barPerspective, barTransformed, M);
}
This however gives me wrong values, and I don't know anymore where to look (Sehe den Wald vor lauter Bäumen nicht mehr).
Image Coordinates https://i.stack.imgur.com/c67EH.png
World Coordinates https://i.stack.imgur.com/Im6M8.png
Points to Transform https://i.stack.imgur.com/hHjBM.png
Transformed Points https://i.stack.imgur.com/P6lLS.png
You see I am even too stupid to post 4 images here??!!?
The 4th index item should be at x 2007 y 717.
I don't know what I am doing wrongly here.
Update 3:
I found the following post Computing x,y coordinate (3D) from image point which is doing exactly what I need. I don't know maybe there is a faster way to do it, but I am not able to find it otherwise. At the moment I can do the checks, but still need to do tests if the algorithm is now robust enough.
Result with SolvePnP to find bar Center
The matrix [R|t] is not square, so by-definition, you cannot invert it. However, this matrix lives in the projective space, which is nothing but an extension of R^n (Euclidean space) with a '1' added as the (n+1)st element. For compatibility issues, the matrices that multiplies with vectors of the projective space are appended by a '1' at their lower-right corner. That is : R becomes
[R|0]
[0|1]
In your case [R|t] becomes
[R|t]
[0|1]
and you can take its inverse which reads as
[R'|-Rt]
[0 | 1 ]
where ' is a transpose. The portion that you need is the top row.
Since the phone translates in the 3D space, you need the distance of the pixel in consideration. This means that the answer to your question about whether you need distances in mm/inches is a yes. The answer changes only if you can assume that the ratio of camera translation to the depth is very small and this is called weak perspective camera. The question that you're trying to tackle is not an easy one. There is still people researching on this at PhD degree.

Determining homography from known planes?

I've got a question related to multiple view geometry.
I'm currently dealing with a problem where I have a number of images collected by a drone flying around an object of interest. This object is planar, and I am hoping to eventually stitch the images together.
Letting aside the classical way of identifying corresponding feature pairs, computing a homography and warping/blending, I want to see what information related to this task I can infer from prior known data.
Specifically, for each acquired image I know the following two things: I know the correspondence between the central point of my image and a point on the object of interest (on whose plane I would eventually want to warp my image). I also have a normal vector to the plane of each image.
So, knowing the centre point (in object-centric world coordinates) and the normal, I can derive the plane equation of each image.
My question is, knowing the plane equation of 2 images is it possible to compute a homography (or part of the transformation matrix, such as the rotation) between the 2?
I get the feeling that this may seem like a very straightforward/obvious answer to someone with deep knowledge of visual geometry but since it's not my strongest point I'd like to double check...
Thanks in advance!
Your "normal" is the direction of the focal axis of the camera.
So, IIUC, you have a 3D point that projects on the image center in both images, which is another way of saying that (absent other information) the motion of the camera consists of the focal axis orbiting about a point on the ground plane, plus an arbitrary rotation about the focal axis, plus an arbitrary translation along the focal axis.
The motion has a non-zero baseline, therefore the transformation between images is generally not a homography. However, the portion of the image occupied by the ground plane does, of course, transform as a homography.
Such a motion is defined by 5 parameters, e.g. the 3 components of the rotation vector for the orbit, plus the the angle of rotation about the focal axis, plus the displacement along the focal axis. However the one point correspondence you have gives you only two equations.
It follows that you don't have enough information to constrain the homography between the images of the ground plane.

Quaternion rotation to latitude/longitude

TL;DR
I have a quaternion representing the orientation of a sphere (an Earth globe). From the quaternion I wish to derive a latitude/longitude. I can visualize in my mind the process, but am weak with the math (matrices/quaternions) and not much better with the code (still learning OpenGL/GLM). How can I achieve this? This is for use in OpenGL using c++ and the GLM library.
Long Version
I am making a mapping program based on a globe of the Earth - not unlike Google Earth, but for a customized purpose that GE cannot be adapted to.
I'm doing this in C++ using OpenGL with the GLM library.
I have successfully coded the sphere and am using a quaternion directly to represent it's orientation. No Euler angles involved. I can rotate the globe using mouse motions thus rotating the globe on arbitrary axes depending on the current viewpoint and orientation.
However, I would like to get a latitude and longitude of a point on the sphere, not only for the user, but for some internal program use as well.
I can visualize that this MUST be possible. Imagine a sphere in world space with no rotations applied. Assuming OpenGL's right hand rule, the north pole points up the Y axis with the equator parallel on the X/Z plane. The latitude/longitude up the Y axis is thus 90N and something else E/W (degenerate). The prime meridian would be on the +Z axis.
If the globe/sphere is rotated arbitrarily the globe's north pole is now somewhere else. This point can be mapped to a latitude/longitude of the original sphere before rotation. Imagine two overlaying spheres, one the globe which is rotated, and the other a fixed reference.
(Actually, it would be in reverse. The latitude/longitude I seek is the point on the rotated sphere that correlates to the north pole of the unrotated reference sphere)
In my mind it seems that somehow I should be able to get the vector of the Earth globe's orientation axis from it's quaternion and compare it to that of the unrotated sphere. But I just can't seem to grok how to do that. (I guess I still don't fully understand mats and quats and have only blundered into my success so far)
I'm hoping to achieve this without needing a crash course in the deep math. I'm looking for a solution/understanding/guidance from the point of view of being able to use the GLM library to achieve my goal. Ideally a code sample with some general explanation. I learn best from example.
FYI, in my code the rotation of the globe/sphere is totally independent of the camera (which does use Euler angles) so it can be moved independently. So I can't use anything from the camera to determine this.
Maybe you could try to follow that link (ie. use boost ;) ) from that thread Longitude / Latitude to quaternion and then deduct the inverse of that conversion.
Or you could also go add a step by converting your quaternion into Euler angle?