I'm trying to implement an intuitive pointing mechanism, where the user would use his hands to just point to an object on-screen. I have most of it ready, except I'm not sure how to write the final part.
Basically, I have a list of calibration points like the following:
typdef struct {
Point2D pointOnScreen, // gives an x/y pixel screen position
Point3D pointingFinger, // gives the position of the user's pointing finger, in space
Point3D usersEyes // gives the position of the user's eyes, in space
} CalibrationPoint;
std::vector<CalibrationPoint> calibrationPoints;
Now, the idea is that I could use these calibrationPoints to write a function that would look something like this:
Point2D whereIsTheUserPointing(Point3D pointingFinger, Point3D usersEyes) {
return the corresponding point on screen; // this would need to be calibrated
// somehow using the calibrationPoints
}
But I have trouble figuring out the math of how to do this. The basic idea is that when you're pointing, you're aligning your finger so that your eyes-finger-object you're pointing at are aligned in a straight line. However, since I don't have the position of the screen in 3D, I thought I could instead get the calibration points and deduce where the user is pointing from that. How would I go about writing the whereIsTheUserPointing() function and calibrating the system?
I'm idealizing, but maybe this will be a start:
I assume that you can obtain universal 3D coordinates for the eyes and the tip of the finger.
Three points in 3D space span a plane. If we could determine three points on your screen, we could locate the screen plane in 3D space. To be safe, let's locate all four corners, so we don't just know the plane, but also its boundaries.
Two straight lines in 3D which meet determine a unique point in 3D.
Thus, in order to find the four corners of the screen, produce four pairs of straight lines, two lines through each corner. This could be done by asking the user to point at the four corners, move, and then point at the four corners again.
Let the co-ordinates of the eyes be (a,b,c) and the coordinates of the end of the finger be (x,y,z). You could easily visualise the joining line in 3D. All you need to do now is to extend the line till it intersects the "plane" of your screen.
Parametric coordinates of the line in your case will be:
(a + T(x-a), b + T(y-b), c + T(z-c))
with:
eye at (a,b,c) and finger at (x,y,z).
With T = 0, you get the coordinate of the eye. With T=1 you get the coordinate of the end of the finger. You can "extend" the line with T>1.
Assuming you have the z-coordinate of the plane of the screen, you could easily get the value of T with the following formula:
T = (Z_VALUE_OF_PLANE-c)/(z-c)
Substitute this value of T to get the other two coordinates (x,y).
The final co-ordinates on the 2D plane will be:
X = a + ((Z_VALUE_OF_PLANE-c)/(z-c))*(x-a)
Y = b + ((Z_VALUE_OF_PLANE-c)/(z-c))*(y-b)
Related
For my undergraduate paper I am working on a iPhone Application using openCV to detect domino tiles. The detection works well in close areas, but when the camera is angled the tiles far away are difficult to detect.
My approach to solve this I would want to do some spacial calculations. For this I would need to convert a 2D Pixel value into world coordinates, calculate a new 3D position with a vector and convert these coordinates back to 2D and then check the colour/shape at that position.
Additionally I would need to know the 3D positions for Augmented Reality additions.
The Camera Matrix i got trough this link create opencv camera matrix for iPhone 5 solvepnp
The Rotationmatrix of the Camera I get from the Core Motion.
Using Aruco markers would be my last resort, as I woulnd't get the decided effect that I would need for the paper.
Now my question is, can i not make calculations when I know the locations and distances of the circles on a lets say Tile with a 5 on it?
I wouldn't need to have a measurement in mm/inches, I can live with vectors without measurements.
The camera needs to be able to be rotated freely.
I tried to invert the calculation sm'=A[R|t]M' to be able to calculate the 2D coordinates in 3D. But I am stuck with inverting the [R|t] even on paper, and I don't know either how I'd do that in swift or c++.
I have read so many different posts on forums, in books etc. and I am completely stuck and appreciate any help/input you can give me. Otherwise I'm screwed.
Thank you so much for your help.
Update:
By using the solvePnP that was suggested by Micka I was able to get the Rotation and Translation Vectors for the angle of the camera.
Meaning that if you are able to identify multiple 2D Points in your image and know their respective 3D World coordinates (in mm, cm, inch, ...), then you can get the mechanisms to project points from known 3D World coordinates onto the respective 2D coordinates in your image. (use the opencv projectPoints function).
What is up next for me to solve is the translation from 2D into 3D coordinates, where I need to follow ozlsn's approach with the inverse of the received matrices out of solvePnP.
Update 2:
With a top down view I am getting along quite well to being able to detect the tiles and their position in the 3D world:
tile from top Down
However if I am now angling the view, my calculations are not working anymore. For example I check the bottom Edge of a 9-dot group and the center of the black division bar for 90° angles. If Corner1 -> Middle Edge -> Bar Center and Corner2 -> Middle Edge -> Bar Center are both 90° angles, than the bar in the middle is found and the position of the tile can be found.
When the view is Angled, then these angles will be shifted due to the perspective to lets say 130° and 50°. (I'll provide an image later).
The Idea I had now is to make a solvePNP of 4 Points (Bottom Edge plus Middle), claculate solvePNP and then rotate the needed dots and the center bar from 2d position to 3d position (height should be irrelevant?). Then i could check with the translated points if the angles are 90° and do also other needed distance calculations.
Here is an image of what I am trying to accomplish:
Markings for Problem
I first find the 9 dots and arrange them. For each Edge I try to find the black bar. As said above, seen from Top, the angle blue corner, green middle edge to yellow bar center is 90°.
However, as the camera is angled, the angle is not 90° anymore. I also cannot check if both angles are 180° together, that would give me false positives.
So I wanted to do the following steps:
Detect Center
Detect Edges (3 dots)
SolvePnP with those 4 points
rotate the edge and the center points (coordinates) to 3D positions
Measure the angles (check if both 90°)
Now I wonder how I can transform the 2D Coordinates of those points to 3D. I don't care about the distance, as I am just calculating those with reference to others (like 1.4 times distance Middle-Edge) etc., if I could measure the distance in mm, that would even be better though. Would give me better results.
With solvePnP I get the rvec which I could change into the rotation Matrix (with Rodrigues() I believe). To measure the angles, my understanding is that I don't need to apply the translation (tvec) from solvePnP.
This leads to my last question, when using the iPhone, can't I use the angles from the motion detection to build the rotation matrix beforehand and only use this to rotate the tile to show it from the top? I feel that this would save me a lot of CPU Time, when I don't have to solvePnP for each tile (there can be up to about 100 tile).
Find Homography
vector<Point2f> tileDots;
tileDots.push_back(corner1);
tileDots.push_back(edgeMiddle);
tileDots.push_back(corner2);
tileDots.push_back(middle.Dot->ellipse.center);
vector<Point2f> realLivePos;
realLivePos.push_back(Point2f(5.5,19.44));
realLivePos.push_back(Point2f(12.53,19.44));
realLivePos.push_back(Point2f(19.56,19.44));
realLivePos.push_back(Point2f(12.53,12.19));
Mat M = findHomography(tileDots, realLivePos, CV_RANSAC);
cout << "M = "<< endl << " " << M << endl << endl;
vector<Point2f> barPerspective;
barPerspective.push_back(corner1);
barPerspective.push_back(edgeMiddle);
barPerspective.push_back(corner2);
barPerspective.push_back(middle.Dot->ellipse.center);
barPerspective.push_back(possibleBar.center);
vector<Point2f> barTransformed;
if (countNonZero(M) < 1)
{
cout << "No Homography found" << endl;
} else {
perspectiveTransform(barPerspective, barTransformed, M);
}
This however gives me wrong values, and I don't know anymore where to look (Sehe den Wald vor lauter Bäumen nicht mehr).
Image Coordinates https://i.stack.imgur.com/c67EH.png
World Coordinates https://i.stack.imgur.com/Im6M8.png
Points to Transform https://i.stack.imgur.com/hHjBM.png
Transformed Points https://i.stack.imgur.com/P6lLS.png
You see I am even too stupid to post 4 images here??!!?
The 4th index item should be at x 2007 y 717.
I don't know what I am doing wrongly here.
Update 3:
I found the following post Computing x,y coordinate (3D) from image point which is doing exactly what I need. I don't know maybe there is a faster way to do it, but I am not able to find it otherwise. At the moment I can do the checks, but still need to do tests if the algorithm is now robust enough.
Result with SolvePnP to find bar Center
The matrix [R|t] is not square, so by-definition, you cannot invert it. However, this matrix lives in the projective space, which is nothing but an extension of R^n (Euclidean space) with a '1' added as the (n+1)st element. For compatibility issues, the matrices that multiplies with vectors of the projective space are appended by a '1' at their lower-right corner. That is : R becomes
[R|0]
[0|1]
In your case [R|t] becomes
[R|t]
[0|1]
and you can take its inverse which reads as
[R'|-Rt]
[0 | 1 ]
where ' is a transpose. The portion that you need is the top row.
Since the phone translates in the 3D space, you need the distance of the pixel in consideration. This means that the answer to your question about whether you need distances in mm/inches is a yes. The answer changes only if you can assume that the ratio of camera translation to the depth is very small and this is called weak perspective camera. The question that you're trying to tackle is not an easy one. There is still people researching on this at PhD degree.
I am using OpenTK(OpenGL) and a general hint will be helpful.
I have a 3d terrain. I have one point on this terrain O(x,y,z) and two perpendicular lines passing through this point that will serve as my X and Y axes.
Now I have a set of 2d points with are in polar coordinates (range,theta). I need to find which points on the terrain correspond to these points. I am not sure what is the best way to do it. I can think of two ideas:
Lets say I am drawing A(x1,y1).
Find the intersection of plane passing through O and A which is perpendicular to the XY plane. This will give me a polyline (semantics may be off). Now on this line, I find a point that is visible from O and is at a distance of the range.
Create a circle which is perpendicular to the XY plane with radius "range", find intersection points on the terrain, find which ones are visible from O and drop rest.
I understand I can find several points which satisfy the conditions, so I will do further check based on topography, but for now I need to get a smaller set which satisfy this condition.
I am new to opengl, but I get geometry pretty well. I am wondering if something like this exists in opengl since it is a standard problem with ground measuring systems.
As you say, both of the options you present will give you more than the one point you need. As I understand your problem, you need only to perform a change of bases from polar coordinates (r, angle) to cartesian coordinates (x,y).
This is fairly straight forward to do. Assuming that the two coordinate spaces share the origin O and that the angle is measured from the x-axis, then point (r_i, angle_i) maps to x_i = r_i*cos(angle_i) and y_i = r_i*sin(angle_i). If those assumptions aren't correct (i.e. if the origins aren't coincident or the angle is not measured from a radii parallel to the x-axis), then the transformation is a bit more complicated but can still be done.
If your terrain is represented as a height map, or 2D array of heights (e.g. Terrain[x][y] = z), once you have the point in cartesian coordinates (x_i,y_i) you can find the height at that point. Of course (x_i, y_i) might not be exactly one of the [x] or [y] indices of the height map.
In that case, I think you have a few options:
Choose the closest (x,y) point and take that height; or
Interpolate the height at (x_i,y_i) based on the surrounding points in the height map.
Unfortunately I am also learning OpenGL and can not provide any specific insights there, but I hope this helps solve your problem.
Reading your description I see a bit of confusion... maybe.
You have defined point O(x,y,z). Fine, this is your pole for the 3D coordinate system. Then you want to find a point defined by polar coordinates. That's fine also - it gives you 2D location. Basically all you need to do is to pinpoint the location in 3D A'(x,y,0), because we are assuming you know the elevation of the A at (r,t), which you of course do from the terrain there.
Angle (t) can be measured only from one axis. Choose which axis will be your polar north and stick to. Then you measure r you have and - voila! - you have your location. What's the point of having 2D coordinate set if you don't use it? Instead, you're adding visibility to the mix - I assume it is important, but highest terrain point on azimuth (t) NOT NECESSARILY will be in the range (r).
You have specific coordinates. Just like RonL suggest, convert to (x,y), find (z) from actual terrain and be done with it.
Unless that's not what you need. But in that case a different question is in order: what do you look for?
In my program, I had a requirement to plot a rectangle that that is prependicular to a line coming from the centre.
To orient the rectangle in this way in 3D space, I used the gluLookAt giving it the lookAt point and plotted the rectangular figure. This worked correctly for me.
To draw the rectangle (in my framework, that uses openGL at the back), I now use a rectangle class and have extended it with a 3D Node (where node is something that has a lookAt point). Given width, height and the top vertex, the rectangle is drawn (node is at the top left vertex and orients the rectangle using lookAt).
Node also has a getPosition() function that gives me its 3D position (top left in rectangle - say 300,400,20). I am trying to get the position of the other three vertices in 3D space to use for my operation. Since the rectangle is oriented in 3D space, other three vertices can't just be fetched by addition of width and height. With the rectangle oriented in 3D, how do I get the position of the other three vertices?
The minimum amount of coordinates is just slightly less than 9: that's three vertices of a generic rectangle in 3d-space (Ax,Ay,Az, Bx,By,Bz, Cx,Cy,Cz).
The last one is e.g. D=A+(B-A)+(C-A)=B+C-A.
Why it's slightly less, is that any triplet of A,B,C coordinates do not necessarily form a 90 degree angle -- but it really doesn't make much sense to pursuit for the simplistic possible arrangement and be prepared to calculate cross-products or normalize vectors.
A----B
| |
C---(D)
EDIT: Vector arithmetic primary:
To Add / Subtract vectors, one sums the elements.
A=B+C means (ax = bx+cx; ay=by+cy; az=bz+cz).
Dot product in (any dimension) is the sum of product of terms:
dot(A,B) = ax*bx + ay*by + az*bz; // for 2,3,4, any number of elements/dimensions.
Cross product is a special operator that is well defined at least in 2 and 3 dimensions. One of the geometric interpretations of cross product is that it produces a vector that is perpendicular to both vectors of it's parameters.
If A is a vector (ax,ay,az), it also means a direction vector from origin O=(0,0,0) i.e. A = A-O = (ax-0,ay-0,az-0);
Likewise (B-A) is a [direction] vector from A to B (sometimes written as AB (with an arrow --> on top))
One can 'add' these directed vectors e.g. as:
o----->
\
\
<------o
/
/
x
And so, if one adds the vector A+(B-A) + (C-A), one ends to the point D.
You can retreive the position of the 3 other points using the normal of the rectangle. In order to orient a rectangle in space, you need 2 information:
its position, commonly represented as a 3 or 4 components vector, or a 4x4 matrix ;
its orientation, commonly represented as a quaternion.
If you have the orientation represented with a normal, and only have one point, you just can’t deduce the other points (because you need another information to solve the rotation equation around the normal). I think the best idea is to use quaternion to orient things in space (you can still retreive the normal from it), but you can also use a normal + one vector from the rectangle. You said you only have one point, and a tuple (width,height), so the common method based on the × operation won’t make it through.
I suggest you to:
make your Node class a class that correctly handles orientation; lookAt isn’t designed for that job ;
combine the translation matrix (position) with a cast matrix from the quaternion (orientation) to correctly handle both position and orientation ;
use that matrix to extract a rotated vector you’ll used like rotated × normal to get the 3 points.
I'm trying to export (3D) bezier curves from Blender to my C++ program. I asked a related question a while back, where I was successfully directed to use De Casteljau's Algorithm to evaluate points (and tangents to these points) along a bezier curve. This works well. In fact, perfectly. I can export the curves and evaluate points along the curve, as well as the tangent to these points, all within my program using De Casteljau's Algorithm.
However, in 3D space a point along a bezier curve and the tangent to this point is not enough to define a "frame" that a camera can lock into, if that makes sense. To put it another way, there is no "up vector" which is required for a camera's orientation to be properly specified at any point along the curve. Mathematically speaking, there are an infinite amount of normal vectors at any point along a 3D bezier curve.
I've noticed when constructing curves in Blender that they aren't merely infinitely thin lines, they actually appear to have a proper 3D orientation defined at any point along them (as shown by the offshooting "arrow lines" in the screenshot below). I'd like to replicate what blender does here as closely as possible in my program. That is, I'd like to be able to form a matrix that represents an orientation at any point along a 3D bezier curve (almost exactly as it would in Blender itself).
Can anyone lend further guidance here, perhaps someone with an intimate knowledge of Blender's source code? (But any advice is welcome, Blender background or not.) I know it's open source, but I'm having a lot of troubles isolating the code responsible for these curve calculations due to the vastness of the program.
Some weeks ago, I have found a solution to this problem. I post it here, in case someone else would need it :
1) For a given point P0, calculate the tangent vector T0.
One simple, easy way, is to take next point on the curve, subtract current point, then normalize result :
T0 = normalize(P1 - P0)
Another, more precise way, to get tangent is to calculate the derivative of your bezier curve function.
Then, pick an arbitrary vector V (for example, you can use (0, 0, 1))
Make N0 = crossproduct(T0, V) and B0 = crossproduct(T0, N0) (dont forget to normalize result vectors after each operation)
You now have a starting set of coordinates ( P0, B0, T0, N0)
This is the initial camera orientation.
2) Then, to calculate next points and their orientation :
Calculate T1 using same method as T0
Here is the trick, new reference frame is calculated from previous frame :
N1 = crossproduct(B0, T1)
B1 = crossproduct(T1, N1)
Proceed using same method for other points. It will results of having camera slightly rotating around tangent vector depending on how curve change its direction. Loopings will be handled correctly (camera wont twist like in my previous answer)
You can watch a live example here (not from me) : http://jabtunes.com/labs/3d/webgl_geometry_extrude_splines.html
Primarily, we know, that the normal vector you're searching for lies on the plane "locally perpendicular" to the curve on the specific point. So the real problem is to choose a single vector on this plane.
I've made an empty object to track the curve and noticed, that it behave similarly to the cart of a rollercoaster: its "up" vector was correlated to the centrifugal force while it was moving along the curve. This one can be uniquely evaluated from the local shape of the curve.
I'm not very good at physics, but I would try to estimate that vector by evaluating two planes: the first is previously mentioned perpendicular plane and the second is a plane made of three neighboring points of a curve segment (if the curve is not straight, these will form a triangle, which describes exactly one plane). Intersection of these two planes will give you an axis and you'll only have to choose a direction of such calculated normal vector.
If I understand you question correcly, what you want is to get 3 orientation vectors (left, front, up) for any point of the curve.
Here is a simple method ( there is a limitation, (*) see below ) :
1) Front vector :
Calculate a 3d point on bezier curve for a given position (t). This is the point for which we will calculate front, left, up vectors. We will call it current_point.
Calculate another 3d point on the curve, next to first one (t + 0.01), let's call it next_point.
Note : i don't write formula here, because i believe you already how to
do that.
Then, to calculate front vector, just substract the two points calculated previously :
vector front = next_point - current_point
Don't forget to normalize the result.
2) Left vector
Define a temporary "up" vector
vector up = vector(0.0f, 1.0f, 0.0f);
Now you can calculate left easily, using front and up :
vector left = CrossProduct(front, up);
3) Up vector
vector up = CrossProduct(left, front);
Using this method you can always calculate a front, left, up for any point along the curve.
(*) NOTE : this wont work in all cases. Imagine you have a loop in you curve, just like a rollercoaster loop. On the top of the loop your calculated up vector will be (0, 1, 0), while you maybe want it to be (0, -1, 0). Only way to solve that is to have two curves : one for points and one for up vectors (from which left and front can be calculated easily).
I am writing a program that will draw a solid along the curve of a spline. I am using visual studio 2005, and writing in C++ for OpenGL. I'm using FLTK to open my windows (fast and light toolkit).
I currently have an algorithm that will draw a Cardinal Cubic Spline, given a set of control points, by breaking the intervals between the points up into subintervals and drawing linesegments between these sub points. The number of subintervals is variable.
The line drawing code works wonderfully, and basically works as follows: I generate a set of points along the spline curve using the spline equation and store them in an array (as a special datastructure called Pnt3f, where the coordinates are 3 floats and there are some handy functions such as distance, length, dot and crossproduct). Then i have a single loop that iterates through the array of points and draws them as so:
glBegin(GL_LINE_STRIP);
for(pt = 0; pt<=numsubsegements ; ++pt) {
glVertex3fv(pt.v());
}
glEnd();
As stated, this code works great. Now what i want to do is, instead of drawing a line, I want to extrude a solid. My current exploration is using a 'cylinder' quadric to create a tube along the line. This is a bit trickier, as I have to orient openGL in the direction i want to draw the cylinder. My idea is to do this:
Psuedocode:
Push the current matrix,
translate to the first control point
rotate to face the next point
draw a cylinder (length = distance between the points)
Pop the matrix
repeat
My problem is getting the angles between the points. I only need yaw and pitch, roll isnt important. I know take the arc-cosine of the dot product of the two points divided by the magnitude of both points, will return the angle between them, but this is not something i can feed to OpenGL to rotate with. I've tried doing this in 2d, using the XZ plane to get x rotation, and making the points vectors from the origin, but it does not return the correct angle.
My current approach is much simpler. For each plane of rotation (X and Y), find the angle by:
arc-cosine( (difference in 'x' values)/distance between the points)
the 'x' value depends on how your set your plane up, though for my calculations I always use world x.
Barring a few issues of it making it draw in the correct quadrant that I havent worked out yet, I want to get advice to see if this was a good implementation, or to see if someone knew a better way.
You are correct in forming two vectors from the three points in two adjacent line segments and then using the arccosine of the dot product to get the angle between them. To make use of this angle you need to determine the axis around which the rotation should occur. Take the cross product of the same two vectors to get this axis. You can then build a transformation matrix using this axis-angle or pass it as parameters to glRotate.
A few notes:
first of all, this:
for(pt = 0; pt<=numsubsegements ; ++pt) {
glBegin(GL_LINE_STRIP);
glVertex3fv(pt.v());
}
glEnd();
is not a good way to draw anything. You MUST have one glEnd() for every single glBegin(). you probably want to get the glBegin() out of the loop. the fact that this works is pure luck.
second thing
My current exploration is using a
'cylinder' quadric to create a tube
along the line
This will not work as you expect. the 'cylinder' quadric has a flat top base and a flat bottom base. Even if you success in making the correct rotations according to the spline the edges of the flat tops are going to pop out of the volume of your intended tube and it will not be smooth. You can try it in 2D with just a pen and a paper. Try to draw a smooth tube using only shorter tubes with a flat bases. This is impossible.
Third, to your actual question, The definitive tool for such rotations are quaternions. Its a bit complex to explain in this scope but you can find plentyful information anywhere you look.
If you'd have used QT instead of FLTK you could have also used libQGLViewer. It has an integrated Quaternion class which would save you the implementation. If you still have a choice I strongly recommend moving to QT.
Have you considered gluLookAt? Put your control point as the eye point, the next point as the reference point, and make the up vector perpendicular to the difference between the two.