Point cloud alignment using principal component analysis with CGAL - c++

I have sets of randomly sampled points on the surface of 3D objects. I want to be able to compute the similarity between two different objects. To make that work, I first have to make sure that the sample points of both objects I want to compare do have the same rotation and scale. I thought I could do this by orienting the principal component axes along the x/y/z axes, and scaling such that the longest principal component does have unit length.
I first compute the centroid of the point set, and translate all points such that the origin becomes the new centroid.
I do the principal component analysis using the CGAL linear_least_squares_fitting_3 function, which gives the best fitting plane through the points. I compute the normal of this plane by taking the cross product of both base vectors:
Plane plane;
linear_least_squares_fitting_3(points.begin(), points.end(),
plane, CGAL::Dimension_tag<0>());
auto dir1 = dir2vec(plane.base1().direction());
auto dir2 = dir2vec(plane.base2().direction());
auto normal = dir1 ^ dir2; // cross product
normal.normalize(); dir1.normalize(); dir2.normalize();
The dir2vec function converts a CGAL::Direction_3 object to an equivalent osg::Vec3d object (I am using the OpenSceneGraph graphics engine). Finally, I rotate everything to the unit axes using the following code:
Matrixd r1, r2, r3;
r1.makeRotate(normal, Vec3d(1,0,0));
r2.makeRotate(dir1 * r1, Vec3d(0,1,0));
r3.makeRotate(dir2 * r1 * r2, Vec3d(0,0,1));
auto rotate = [&](Vec3d const &p) {
return p * r1 * r2 * r3;
};
transform(osgPoints.begin(), osgPoints.end(), osgPoints.begin(), rotate);
Here, osgPoints is an vector<osg::Vec3d>. For testing purposes, I translate the centroid of the rotated points back to original location, so both point clouds don't overlap.
Vec3d center = point2vec(centroid);
auto tocentroid = [&](Vec3d const &v) {
return v + center;
};
transform(osgPoints.begin(), osgPoints.end(), osgPoints.begin(), tocentroid);
To test it, I use two copies of the same point set, however one is transformed (rotated and translated). The above code should undo the rotations, however the results are not what I did expect: See this image. The red lines indicate the base vectors of the best fitting planes and their normal. It looks like that the results of both calls to linear_least_squares_fitting_3 gives slightly different answers, as one of the planes is rotated a little bit with respect to the other.
Here is another image where both objects are positioned with their centroid in the origin. It is now clearly visible that the normals and base vectors fall together, but the points do not.
Does anybody know why this happens, and, how I can prevent it?

Fitting a plane to a set of points leaves one degree of freedom unconstrained. The plane is free to spin around its normal and the fit is equal. I don't know anything about CGAL, but I wouldn't be surprised to discover that they just find a convenient plane when finding the fit (probably a nearest projection from the original axes of the space).
If you did real PCA on the point cloud, I don't think you'd have that problem. Alternatively, perhaps you could rescale (stretch) your data along the normal discovered by the fitting algorithm and then find another fit. If you stretch the data out sufficiently, then the first plane found shouldn't be as good a fit as some orthogonal plane.

It indeed seemed that CGAL does not compute all principal components, as JCooper suggested. I switched to the ALGLIB library to do the PCA and now it works.

Related

Given a Fundamental Matrix and Image Points in one image plane, find exactly corresponding points in second Image Plane

As a relative beginner in this topic, I have read the literature but I am not sure about how to manipulate the equations to my purposes and would like advice on tackling this topic.
Preamble:
I have 2 cameras in a stereo rig which have been calibrated, thus extracting data structures such as each camera's Camera Matrix K1 and K2, as well as the Fundamental, Essential, Rotation and Translation matrices, F, E, R and T respectively. Also after rectifying one has the projection matrices P1 and P2 as well as the disparity matrix Q.
My aim is however to test the triangulation method of OpenCV, and to this end I would like to use synthetic Images where the correspondence between the points in image1 and image2 is exact.
My idea was to take an image of a chessboard with one camera, and use findCorners() and cornerSubPix() to get image points in the left camera, let's call them imagePoints1.
To get synthetically generated Image Points with exactly corresponding points on the left camera's image plane, I intend to use the property
x2'Fx1 = 0, given the F matrix and x1 (which represents one homogenous 2D point from imagePoints1)
to generate said set of Image Points.
This is where I am stuck, since he obvious solution would be to have a zero-vector to make this equation work. Otherwise I get a parametric solution. How do I then get non-zero points that fulfill this property x2'Fx1 = 0 given x1 and F ?
Thank you.

camera extrinsic calibration

I have a fisheye camera, which I have already calibrated. I need to calculate the camera pose w.r.t a checkerboard just by using a single image of said checkerboard,the intrinsic parameters, and the size of the squares of the checkerboards. Unfortunately many calibration libraries first calculate the extrinsic parameters from a set of images and then the intrinsic parameters, which is essentially the "inverse" procedure of what I want. Of course I can just put my checkerboard image inside the set of other images I used for the calibration and run the calib procedure again, but it's very tedious, and moreover, I can't use a checkerboard of different size from the ones used for the instrinsic calibration. Can anybody point me in the right direction?
EDIT: After reading francesco's answer, I realized that I didn't explain what I mean by calibrating the camera. My problem begins with the fact that I don't have the classic intrinsic parameters matrix (so I can't actually use the method Francesco described).In fact I calibrated the fisheye camera with the Scaramuzza's procedure (https://sites.google.com/site/scarabotix/ocamcalib-toolbox), which basically finds a polynom which maps 3d world points into pixel coordinates( or, alternatively, the polynom which backprojects pixels to the unit sphere). Now, I think these information are enough to find the camera pose w.r.t. a chessboard, but I'm not sure exactly how to proceed.
the solvePnP procedure calculates extrinsic pose for Chess Board (CB) in camera coordinates. openCV added a fishEye library to its 3D reconstruction module to accommodate significant distortions in cameras with a large field of view. Of course, if your intrinsic matrix or transformation is not a classical intrinsic matrix you have to modify PnP:
Undo whatever back projection you did
Now you have so-called normalized camera where intrinsic matrix effect was eliminated.
k*[u,v,1]T = R|T * [x, y, z, 1]T
The way to solve this is to write the expression for k first:
k=R20*x+R21*y+R22*z+Tz
then use the above expression in
k*u = R00*x+R01*y+R02*z+Tx
k*v = R10*x+R11*y+R12*z+Tx
you can rearrange the terms to get Ax=0, subject to |x|=1, where unknown
x=[R00, R01, R02, Tx, R10, R11, R12, Ty, R20, R21, R22, Tz]T
and A, b
are composed of known u, v, x, y, z - pixel and CB corner coordinates;
Then you solve for x=last column of V, where A=ULVT, and assemble rotation and translation matrices from x. Then there are few ‘messy’ steps that are actually very typical for this kind of processing:
A. Ensure that you got a real rotation matrix - perform orthogonal Procrustes on your R2 = UVT, where R=ULVT
B. Calculate scale factor scl=sum(R2(i,j)/R(i,j))/9;
C. Update translation vector T2=scl*T and check for Tz>0; if it is negative invert T and negate R;
Now, R2, T2 give you a good starting point for non linear algorithm optimization such as Levenberg Marquardt. It is required because a previous linear step optimizes only an algebraic error of parameters while non-linear one optimizes a correct metrics such as squared error in pixel distances. However, if you don’t want to follow all these steps you can take advantage of the fish-eye library of openCV.
I assume that by "calibrated" you mean that you have a pinhole model for your camera.
Then the transformation between your chessboard plane and the image plane is a homography, which you can estimate from the image of the corners using the usual DLT algorithm. You can then express it as the product, up to scale, of the matrix of intrinsic parameters A and [x y t], where x and y columns are the x and y unit vectors of the world's (i.e. chessboard's) coordinate frame, and t is the vector from the camera centre to the origin of that same frame. That is:
H = scale * A * [x|y|t]
Therefore
[x|y|t] = 1/scale * inv(A) * H
The scale is chosen so that x and y have unit length. Once you have x and y, the third axis is just their cross product.

Draping 2d point on a 3d terrain

I am using OpenTK(OpenGL) and a general hint will be helpful.
I have a 3d terrain. I have one point on this terrain O(x,y,z) and two perpendicular lines passing through this point that will serve as my X and Y axes.
Now I have a set of 2d points with are in polar coordinates (range,theta). I need to find which points on the terrain correspond to these points. I am not sure what is the best way to do it. I can think of two ideas:
Lets say I am drawing A(x1,y1).
Find the intersection of plane passing through O and A which is perpendicular to the XY plane. This will give me a polyline (semantics may be off). Now on this line, I find a point that is visible from O and is at a distance of the range.
Create a circle which is perpendicular to the XY plane with radius "range", find intersection points on the terrain, find which ones are visible from O and drop rest.
I understand I can find several points which satisfy the conditions, so I will do further check based on topography, but for now I need to get a smaller set which satisfy this condition.
I am new to opengl, but I get geometry pretty well. I am wondering if something like this exists in opengl since it is a standard problem with ground measuring systems.
As you say, both of the options you present will give you more than the one point you need. As I understand your problem, you need only to perform a change of bases from polar coordinates (r, angle) to cartesian coordinates (x,y).
This is fairly straight forward to do. Assuming that the two coordinate spaces share the origin O and that the angle is measured from the x-axis, then point (r_i, angle_i) maps to x_i = r_i*cos(angle_i) and y_i = r_i*sin(angle_i). If those assumptions aren't correct (i.e. if the origins aren't coincident or the angle is not measured from a radii parallel to the x-axis), then the transformation is a bit more complicated but can still be done.
If your terrain is represented as a height map, or 2D array of heights (e.g. Terrain[x][y] = z), once you have the point in cartesian coordinates (x_i,y_i) you can find the height at that point. Of course (x_i, y_i) might not be exactly one of the [x] or [y] indices of the height map.
In that case, I think you have a few options:
Choose the closest (x,y) point and take that height; or
Interpolate the height at (x_i,y_i) based on the surrounding points in the height map.
Unfortunately I am also learning OpenGL and can not provide any specific insights there, but I hope this helps solve your problem.
Reading your description I see a bit of confusion... maybe.
You have defined point O(x,y,z). Fine, this is your pole for the 3D coordinate system. Then you want to find a point defined by polar coordinates. That's fine also - it gives you 2D location. Basically all you need to do is to pinpoint the location in 3D A'(x,y,0), because we are assuming you know the elevation of the A at (r,t), which you of course do from the terrain there.
Angle (t) can be measured only from one axis. Choose which axis will be your polar north and stick to. Then you measure r you have and - voila! - you have your location. What's the point of having 2D coordinate set if you don't use it? Instead, you're adding visibility to the mix - I assume it is important, but highest terrain point on azimuth (t) NOT NECESSARILY will be in the range (r).
You have specific coordinates. Just like RonL suggest, convert to (x,y), find (z) from actual terrain and be done with it.
Unless that's not what you need. But in that case a different question is in order: what do you look for?

Replicating Blender bezier curves in a C++ program

I'm trying to export (3D) bezier curves from Blender to my C++ program. I asked a related question a while back, where I was successfully directed to use De Casteljau's Algorithm to evaluate points (and tangents to these points) along a bezier curve. This works well. In fact, perfectly. I can export the curves and evaluate points along the curve, as well as the tangent to these points, all within my program using De Casteljau's Algorithm.
However, in 3D space a point along a bezier curve and the tangent to this point is not enough to define a "frame" that a camera can lock into, if that makes sense. To put it another way, there is no "up vector" which is required for a camera's orientation to be properly specified at any point along the curve. Mathematically speaking, there are an infinite amount of normal vectors at any point along a 3D bezier curve.
I've noticed when constructing curves in Blender that they aren't merely infinitely thin lines, they actually appear to have a proper 3D orientation defined at any point along them (as shown by the offshooting "arrow lines" in the screenshot below). I'd like to replicate what blender does here as closely as possible in my program. That is, I'd like to be able to form a matrix that represents an orientation at any point along a 3D bezier curve (almost exactly as it would in Blender itself).
Can anyone lend further guidance here, perhaps someone with an intimate knowledge of Blender's source code? (But any advice is welcome, Blender background or not.) I know it's open source, but I'm having a lot of troubles isolating the code responsible for these curve calculations due to the vastness of the program.
Some weeks ago, I have found a solution to this problem. I post it here, in case someone else would need it :
1) For a given point P0, calculate the tangent vector T0.
One simple, easy way, is to take next point on the curve, subtract current point, then normalize result :
T0 = normalize(P1 - P0)
Another, more precise way, to get tangent is to calculate the derivative of your bezier curve function.
Then, pick an arbitrary vector V (for example, you can use (0, 0, 1))
Make N0 = crossproduct(T0, V) and B0 = crossproduct(T0, N0) (dont forget to normalize result vectors after each operation)
You now have a starting set of coordinates ( P0, B0, T0, N0)
This is the initial camera orientation.
2) Then, to calculate next points and their orientation :
Calculate T1 using same method as T0
Here is the trick, new reference frame is calculated from previous frame :
N1 = crossproduct(B0, T1)
B1 = crossproduct(T1, N1)
Proceed using same method for other points. It will results of having camera slightly rotating around tangent vector depending on how curve change its direction. Loopings will be handled correctly (camera wont twist like in my previous answer)
You can watch a live example here (not from me) : http://jabtunes.com/labs/3d/webgl_geometry_extrude_splines.html
Primarily, we know, that the normal vector you're searching for lies on the plane "locally perpendicular" to the curve on the specific point. So the real problem is to choose a single vector on this plane.
I've made an empty object to track the curve and noticed, that it behave similarly to the cart of a rollercoaster: its "up" vector was correlated to the centrifugal force while it was moving along the curve. This one can be uniquely evaluated from the local shape of the curve.
I'm not very good at physics, but I would try to estimate that vector by evaluating two planes: the first is previously mentioned perpendicular plane and the second is a plane made of three neighboring points of a curve segment (if the curve is not straight, these will form a triangle, which describes exactly one plane). Intersection of these two planes will give you an axis and you'll only have to choose a direction of such calculated normal vector.
If I understand you question correcly, what you want is to get 3 orientation vectors (left, front, up) for any point of the curve.
Here is a simple method ( there is a limitation, (*) see below ) :
1) Front vector :
Calculate a 3d point on bezier curve for a given position (t). This is the point for which we will calculate front, left, up vectors. We will call it current_point.
Calculate another 3d point on the curve, next to first one (t + 0.01), let's call it next_point.
Note : i don't write formula here, because i believe you already how to
do that.
Then, to calculate front vector, just substract the two points calculated previously :
vector front = next_point - current_point
Don't forget to normalize the result.
2) Left vector
Define a temporary "up" vector
vector up = vector(0.0f, 1.0f, 0.0f);
Now you can calculate left easily, using front and up :
vector left = CrossProduct(front, up);
3) Up vector
vector up = CrossProduct(left, front);
Using this method you can always calculate a front, left, up for any point along the curve.
(*) NOTE : this wont work in all cases. Imagine you have a loop in you curve, just like a rollercoaster loop. On the top of the loop your calculated up vector will be (0, 1, 0), while you maybe want it to be (0, -1, 0). Only way to solve that is to have two curves : one for points and one for up vectors (from which left and front can be calculated easily).

How can I determine distance from an object in a video?

I have a video file recorded from the front of a moving vehicle. I am going to use OpenCV for object detection and recognition but I'm stuck on one aspect. How can I determine the distance from a recognized object.
I can know my current speed and real-world GPS position but that is all. I can't make any assumptions about the object I'm tracking. I am planning to use this to track and follow objects without colliding with them. Ideally I would like to use this data to derive the object's real-world position, which I could do if I could determine the distance from the camera to the object.
Your problem's quite standard in the field.
Firstly,
you need to calibrate your camera. This can be done offline (makes life much simpler) or online through self-calibration.
Calibrate it offline - please.
Secondly,
Once you have the calibration matrix of the camera K, determine the projection matrix of the camera in a successive scene (you need to use parallax as mentioned by others). This is described well in this OpenCV tutorial.
You'll have to use the GPS information to find the relative orientation between the cameras in the successive scenes (that might be problematic due to noise inherent in most GPS units), i.e. the R and t mentioned in the tutorial or the rotation and translation between the two cameras.
Once you've resolved all that, you'll have two projection matrices --- representations of the cameras at those successive scenes. Using one of these so-called camera matrices, you can "project" a 3D point M on the scene to the 2D image of the camera on to pixel coordinate m (as in the tutorial).
We will use this to triangulate the real 3D point from 2D points found in your video.
Thirdly,
use an interest point detector to track the same point in your video which lies on the object of interest. There are several detectors available, I recommend SURF since you have OpenCV which also has several other detectors like Shi-Tomasi corners, Harris, etc.
Fourthly,
Once you've tracked points of your object across the sequence and obtained the corresponding 2D pixel coordinates you must triangulate for the best fitting 3D point given your projection matrix and 2D points.
The above image nicely captures the uncertainty and how a best fitting 3D point is computed. Of course in your case, the cameras are probably in front of each other!
Finally,
Once you've obtained the 3D points on the object, you can easily compute the Euclidean distance between the camera center (which is the origin in most cases) and the point.
Note
This is obviously not easy stuff but it's not that hard either. I recommend Hartley and Zisserman's excellent book Multiple View Geometry which has described everything above in explicit detail with MATLAB code to boot.
Have fun and keep asking questions!
When you have moving video, you can use temporal parallax to determine the relative distance of objects. Parallax: (definition).
The effect would be the same we get with our eyes which which can gain depth perception by looking at the same object from slightly different angles. Since you are moving, you can use two successive video frames to get your slightly different angle.
Using parallax calculations, you can determine the relative size and distance of objects (relative to one another). But, if you want the absolute size and distance, you will need a known point of reference.
You will also need to know the speed and direction being traveled (as well as the video frame rate) in order to do the calculations. You might be able to derive the speed of the vehicle using the visual data but that adds another dimension of complexity.
The technology already exists. Satellites determine topographic prominence (height) by comparing multiple images taken over a short period of time. We use parallax to determine the distance of stars by taking photos of night sky at different points in earth's orbit around the sun. I was able to create 3-D images out of an airplane window by taking two photographs within short succession.
The exact technology and calculations (even if I knew them off the top of my head) are way outside the scope of discussing here. If I can find a decent reference, I will post it here.
You need to identify the same points in the same object on two different frames taken a known distance apart. Since you know the location of the camera in each frame, you have a baseline ( the vector between the two camera positions. Construct a triangle from the known baseline and the angles to the identified points. Trigonometry gives you the length of the unknown sides of the traingles for the known length of the baseline and the known angles between the baseline and the unknown sides.
You can use two cameras, or one camera taking successive shots. So, if your vehicle is moving a 1 m/s and you take fames every second, then successibe frames will gibe you a 1m baseline which should be good to measure the distance of objects up to, say, 5m away. If you need to range objects further away than the frames used need to be further apart - however more distant objects will in view for longer.
Observer at F1 sees target at T with angle a1 to velocity vector. Observer moves distance b to F2. Sees target at T with angle a2.
Required to find r1, range from target at F1
The trigonometric identity for cosine gives
Cos( 90 – a1 ) = x / r1 = c1
Cos( 90 - a2 ) = x / r2 = c2
Cos( a1 ) = (b + z) / r1 = c3
Cos( a2 ) = z / r2 = c4
x is distance to target orthogonal to observer’s velocity vector
z is distance from F2 to intersection with x
Solving for r1
r1 = b / ( c3 – c1 . c4 / c2 )
Two cameras so you can detect parallax. It's what humans do.
edit
Please see ravenspoint's answer for more detail. Also, keep in mind that a single camera with a splitter would probably suffice.
use stereo disparity maps. lots of implementations are afloat, here are some links:
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT11/node4.html
http://www.ece.ucsb.edu/~manj/ece181bS04/L14(morestereo).pdf
In you case you don't have stereo camera, but depth can be evaluated using video
http://www.springerlink.com/content/g0n11713444148l2/
I think the above will be what might help you the most.
research has progressed so far that depth can be evaluated ( though not to a satisfactory extend) from a single monocular image
http://www.cs.cornell.edu/~asaxena/learningdepth/
Someone please correct me if I'm wrong, but it seems to me that if you're going to simply use a single camera and simply relying on a software solution, any processing you might do would be prone to false positives. I highly doubt that there is any processing that could tell the difference between objects that really are at the perceived distance and those which only appear to be at that distance (like the "forced perspective") in movies.
Any chance you could add an ultrasonic sensor?
first, you should calibrate your camera so you can get the relation between the objects positions in the camera plan and their positions in the real world plan, if you are using a single camera you can use the "optical flow technic"
if you are using two cameras you can use the triangulation method to find the real position (it will be easy to find the distance of the objects) but the probem with the second method is the matching, which means how can you find the position of an object 'x' in camera 2 if you already know its position in camera 1, and here you can use the 'SIFT' algorithme.
i just gave you some keywords wish it could help you.
Put and object of known size in the cameras field of view. That way you can have a more objective metric to measure angular distances. Without a second viewpoint/camera you'll be limited to estimating size/distance but at least it won't be a complete guess.