How to calibrate camera focal length, translation and rotation given four points? - c++

I'm trying to find the focal length, position and orientation of a camera in world space.
Because I need this to be resolution-independent, I normalized my image coordinates to be in the range [-1, 1] for x, and a somewhat smaller range for y (depending on aspect ratio). So (0, 0) is the center of the image. I've already corrected for lens distortion (using k1 and k2 coefficients), so this does not enter the picture, except sometimes throwing x or y slightly out of the [-1, 1] range.
As a given, I have a planar, fixed rectangle in world space of known dimensions (in millimeters). The four corners of the rectangle are guaranteed to be visible, and are manually marked in the image. For example:
std::vector<cv::Point3f> worldPoints = {
cv::Point3f(0, 0, 0),
cv::Point3f(2000, 0, 0),
cv::Point3f(0, 3000, 0),
cv::Point3f(2000, 3000, 0),
};
std::vector<cv::Point2f> imagePoints = {
cv::Point2f(-0.958707, -0.219624),
cv::Point2f(-1.22234, 0.577061),
cv::Point2f(0.0837469, -0.1783),
cv::Point2f(0.205473, 0.428184),
};
Effectively, the equation I think I'm trying to solve is (see the equivalent in the OpenCV documentation):
/ xi \ / fx 0 \ / tx \ / Xi \
s | yi | = | fy 0 | | Rxyz ty | | Yi |
\ 1 / \ 1 / \ tz / | Zi |
\ 1 /
where:
i is 1, 2, 3, 4
xi, yi is the location of point i in the image (between -1 and 1)
fx, fy are the focal lengths of the camera in x and y direction
Rxyz is the 3x3 rotation matrix of the camera (has only 3 degrees of freedom)
tx, ty, tz is the translation of the camera
Xi, Yi, Zi is the location of point i in world space (millimeters)
So I have 8 equations (4 points of 2 coordinates each), and I have 8 unknowns (fx, fy, Rxyz, tx, ty, tz). Therefore, I conclude (barring pathological cases) that a unique solution must exist.
However, I can't seem to figure out how to compute this solution using OpenCV.
I have looked at the imgproc module:
getPerspectiveTransform works, but gives me a 3x3 matrix only (from 2D points to 2D points). If I could somehow extract the needed parameters from this matrix, that would be great.
I have also looked at the calib3d module, which contains a few promising functions that do almost, but not quite, what I need:
initCameraMatrix2D sounds almost perfect, but when I pass it my four points like this:
cv::Mat cameraMatrix = cv::initCameraMatrix2D(
std::vector<std::vector<cv::Point3f>>({worldPoints}),
std::vector<std::vector<cv::Point2f>>({imagePoints}),
cv::Size2f(2, 2), -1);
it returns me a camera matrix that has fx, fy set to -inf, inf.
calibrateCamera seems to use a complicated solver to deal with overdetermined systems and outliers. I tried it anyway, but all I can get from it are assertion failures like this:
OpenCV(3.4.1) Error: Assertion failed (0 <= i && i < (int)vv.size()) in getMat_, file /build/opencv/src/opencv-3.4.1/modules/core/src/matrix_wrap.cpp, line 79
Is there a way to entice OpenCV to do what I need? And if not, how could I do it by hand?

3x3 rotation matrices have 9 elements but, as you said, only 3 degrees of freedom. One subtlety is that exploiting this property makes the equation non-linear in the angles you want to estimate, and non-linear equations are harder to solve than linear ones.
This kind of equations are usually solved by:
considering that the P=K.[R | t] matrix has 12 degrees of freedom and solving the resulting linear equation using the SVD decomposition (see Section 7.1 of 'Multiple View Geometry' by Hartley & Zisserman for more details)
decomposing this intermediate result into an initial approximate solution to your non-linear equation (see for example cv::decomposeProjectionMatrix)
refining the approximate solution using an iterative solver which is able to deal with non-linear equations and with the reduced degrees of freedom of the rotation matrix (e.g. Levenberg-Marquard algorithm). I am not sure if there is a generic implementation of this in OpenCV, however it is not too complicated to implement one yourself using the Ceres Solver library.
However, your case is a bit particular because you do not have enough point matches to solve the linear formulation (i.e. step 1) reliably. This means that, as you stated it, you have no way to initialize an iterative refining algorithm to get an accurate solution to your problem.
Here are a few work-arounds that you can try:
somehow get 2 additional point matches, leading to a total of 6 matches hence 12 constraints on your linear equation, allowing you to solve the problem using the steps 1, 2, 3 above.
somehow guess manually an initial estimate for your 8 parameters (2 focal lengths, 3 angles & 3 translations), and directly refine them using an iterative solver. Be aware that the iterative process might converge to a wrong solution if your initial estimate is too far off.
reduce the number of unknowns in your model. For instance, if you manage to fix two of the three angles (e.g. roll & pitch) the equations might simplify a lot. Also, the two focal lengths are probably related via the aspect ratio, so if you know it and if your pixels are square, then you actually have a single unknown there.
if all else fails, there might be a way to extract approximated values from the rectifying homography estimated by cv::getPerspectiveTransform.
Regarding the last bullet point, the opposite of what you want is clearly possible. Indeed, the rectifying homography can be expressed analytically knowing the parameters you want to estimate. See for instance this post and this post. There is also a full chapter on this in the Hartley & Zisserman book (chapter 13).
In your case, you want to go the other way around, i.e. to extract the intrinsic & extrinsic parameters from the homography. There is a somewhat related function in OpenCV (cv::decomposeHomographyMat), but it assumes the K matrix is known and it outputs 4 candidate solutions.
In the general case, this would be tricky. But maybe in your case you can guess a reasonable estimate for the focal length, hence for K, and use the point correspondences to select the good solution to your problem. You might also implement a custom optimization algorithm, testing many focal length values and keeping the solution leading to the lowest reprojection error.

Related

Get known position in one image to another using 8-point algorithm

I have two images and and know the position of a point in the first image. Now I want to get the corresponding position in the second image.
This is my idea:
I can use algorithms such as SIFT to match keypoints (as seen in the image)
I know the camera matrix using calibration with e.g. chessboards
Using the 8 point algorithm I calculate the fundamental matrix F
Can I now use F to calculate the corresponding point?
Using fundamental matrix F alone is not enough. If you have a point on one image, you can't find its position on the second image, because it depends not only on configuration of the cameras, but also on the distance from the camera to that point.
This can also be seen from the equation x2^T * F * x1 = 0. If you know x1 and F, then for x2 you get equation x2^T * b = 0, where b = F * x1. This is an equation of a point x2 lying on the line b (points x1, x2 and line b are in homogeneous coordinates). Although you cant find the exact position of the point on the second image, you know that it must lie somewhere on that line.
Hartley and Zisserman have a great explanation these of these concepts in their book Multiple View Geometry in Computer Vision. Be sure to check it out for more details.

Wall detection using PCL and RANSAC

I have been working on wall detection in PCD (Point Cloud Data) file using PCL (Point Cloud Library). The PCD file has been generated through a depth camera. I found that in many of the similar applications e.g. floor detection, RANSAC has been used. So, I thought of applying RANSAC here as well and I tried my best to understand RANSAC in-general but I still have certain questions pertaining to my application:
In brief, RANSAC tries to remove the outliers in the given data and generalize the inliers through a model iteratively. So, in the case of floor detection, would it consider the rest of the point clouds corresponding to other objects as just outliers and the floor as inlier? The same is the case for the walls?
As per Plane model segmentation tutorial by PCL, RANSAC is giving the coefficients of the model plane i.e. a, b, c, and d in the equation of plane: a*x + b*y + c*z + d = 0 through coefficients->values vector. So, I assume that in the case of wall detection, it would try to give the equation of the plane corresponding to the wall. However, what if the depth camera is at the corner of a room and the top view of walls look like this:
             wall 1
       ______________
       |
       |
       | wall 2
       |
       |
       So, in this case, what would be the resultant model plane look like? Would it be kind of a        hypotenuse (making a triangle)?
             wall 1
       ---------------------
                                |
                                | wall 2
                                |
                                 ----------------------
                                
         wall 3
       Even in this case, how would it look like?
As per the Extracting indices from a PointCloud tutorial by PCL, ExtractIndices <pcl::ExtractIndices> filter is used to extract a subset of points from a point cloud based on the indices output by a segmentation algorithm. But, what exactly this filter is doing? In fact, in the case of floor detection or wall detection (assuming there is only one straight wall), RANSAC is already giving an equation of one plane. So, is there any need of using that filter? If yes then why and how?
How can I detect multiple walls in the following case? ExtractIndices <pcl::ExtractIndices> filter can do this? If yes then how?
             wall 1
       ---------------------
                                |
                                | wall 2
                                |
                                 ----------------------
                                
         wall 3
If you think that there are better ways than using RANSAC then also please let me know.
Answers for a few question you asked:
As fas as I know, in case of using plane model, RANSAC chooses 3 points from the cloud randomly, and considers it as a plane.(this is a provisional statement that will be substantiated later) All the points that are closer to this plane that the given threshold as a perpendicular distance are considered as inliers.
The algorithm gives back the plane which contains the most points in it (the found plane also depends on the iteration number you choose, if it is too low maybe it misses the largest plane).
In case of walls the story is the same. You can search for planes, but should choose the searching directions well. Walls are casually perpendicular to plane x-y. The parameters should be set considered this.
Example:
pcl::SACSegmentation<pcl::PointXYZI> seg;
Eigen::Vector3f axis;
//HELPER VARIABLES
float angle = 12.0;
void set_segmentation(float threshold, int max_iteration, float probability) {
seg.setModelType(pcl::SACMODEL_PERPENDICULAR_PLANE);
seg.setMethodType(pcl::SAC_RANSAC);
// set cloud, threshold, and other paramatres
seg.setDistanceThreshold(threshold);//Distance need to be adjusted according
to the obj
seg.setMaxIterations(max_iteration);
seg.setProbability(probability);
}
PlaneSegment(float x, float y, float z, float set_angle, float threshold, int
max_iteration, float probability) {
axis = Eigen::Vector3f(x, y, z);
angle = set_angle;
seg.setAxis(axis);
seg.setEpsAngle(angle * (3.1415 / 180.0f));
//SET SEGMENTATION
set_segmentation(threshold, max_iteration, probability);
}
pcl::PointIndices::Ptr segment_plane(pcl::PointCloud<pcl::PointXYZI>::Ptr cloud,
pcl::PointIndices::Ptr inliers) {
seg.setInputCloud(cloud);
seg.segment(*inliers, *coefficients);
if (inliers->indices.size() == 0)
{
PCL_ERROR("COULD NOT ESTIMATE PLANAR MODEL.\n");
exit(-1);
}
return inliers;
}
pcl::PointCloud<pcl::PointXYZI>::Ptr
extraction(pcl::PointCloud<pcl::PointXYZI>::Ptr cloud, pcl::PointIndices::Ptr
inliers) {
extract.setInputCloud(cloud);
extract.setIndices(inliers);
extract.setNegative(true);
extract.filter(*cloud);
return cloud;
}
You can set an acceptance angle, in this case 12 degrees, and also the searching directions based on the axis.
For your second point:
In case of multiple walls, it will give back the one which contains the most points. But you should be able to extract the other planes also if needed. (advice, save all planes which contains more points than a threshold you choose)
I chekced after your problem, this is also a solution for that: pcl::RANSAC segmentation, get all planes in cloud?. First comment gives very good answer.
Third point:
Check the example code. Note, this is a class so thats why there is a constructor. The segment_plane function returns inliers. Based on that you can call the extraction function, and it removes the inliers from the cloud. This is a very simple and fast soulition for this. You can avoid the suffering with the coefficients values. Also, if you dont want to remove them just colour them by iterating through the inliers and set its intensity to a chosen value.
RANSAC algorithm can be robust, but sometimes it just really does not work. Also it can be slow because of the iteration number.
There are multiple ways to solve this problem on another way.
Just an example: consider a grid below the cloud. A lot of equal sized square cells. In each cell you check the minimum and maximum point heights. Based on this you can get the ground plane (If these values are just slightly different, and are close to each other. The difference of the maximum heigth and the minimum height is very low in case of ground cell) or the walls. With walls you can assume that the points have an even distribution in each cells and the difference of the maximum - minimum values are high.
Best regards.

Homographie Opencv apply in Opengl

I modified a algorithm to rectif. It returns me 2 Opencv homographies (3x3 Matrixes). I can use cv::warpPerspective and get rectified images. So the algorithm works right. But I need to apply this homographies to textures in OpenGl. So I create a 4x4 Matrix (HomoGl) and I use
glMultMatrixf(HomoGl);
to apply this Tranform. To fill HomoGl I use
for(int i=0;i<3;++i){
for(int j=0; j<3;++j){
HomoGL[i+j*4] = HomoCV.at<double>(i,j);
}
}
This methode has the best result...but it is wrong. I test some other methods[1] but they doesn't work.
My Question: How can I convert the OpenCV Homography, so I can use
glMultMatrixf to get right transformed Images.
[1]http://www.aiqus.com/questions/24699/from-2d-homography-of-2-planes-to-3d-rotation-of-opengl-camera
So an H matrix is the transformation of 1 point on plane one to another point on plane 2.
X1 = H*X2
When you use warpHomography in opencv you are putting the points in the perceptive of plane 2.
The matrix (or image mat) that you get out of that warping is the texture you should use when applying to the surface.
Your extension of the 3x3 homography to 4x4 is wrong. The most naive approach which will somewhat work would be an extension of the form
h11 h12 h13 h11 h12 0 h13
H = h21 h22 h23 -> H' = h21 h22 0 h23
h31 h32 h32 0 0 1 0
h31 h32 0 h33
The problem with this approach is that while it gives the correct result for x and y, it will distort z, since the modified w component affects all coordinates. If the z coordinate matters, you need a different approach.
In this paper, an approximation is proposed which will minimize the effects on the depth (see equation 5, you also will need to normalize your homography so that h33=1). However, this approximation will only work well enough for small distortions. If you have some extreme trapezoid distorion, that approch will also fail. In that case, a 2-pass approach of rendering into the texture and and applying the 2D distortion is possible.
With the modern programmable pipeline, one could also deal with this in one pass by undistorting the z coordinate in the fragment shader (but that can have some negative impact on performance on its own).

Robust atan(y,x) on GLSL for converting XY coordinate to angle

In GLSL (specifically 3.00 that I'm using), there are two versions of
atan(): atan(y_over_x) can only return angles between -PI/2, PI/2, while atan(y/x) can take all 4 quadrants into account so the angle range covers everything from -PI, PI, much like atan2() in C++.
I would like to use the second atan to convert XY coordinates to angle.
However, atan() in GLSL, besides not able to handle when x = 0, is not very stable. Especially where x is close to zero, the division can overflow resulting in an opposite resulting angle (you get something close to -PI/2 where you suppose to get approximately PI/2).
What is a good, simple implementation that we can build on top of GLSL atan(y,x) to make it more robust?
I'm going to answer my own question to share my knowledge. We first notice that the instability happens when x is near zero. However, we can also translate that as abs(x) << abs(y). So first we divide the plane (assuming we are on a unit circle) into two regions: one where |x| <= |y| and another where |x| > |y|, as shown below:
We know that atan(x,y) is much more stable in the green region -- when x is close to zero we simply have something close to atan(0.0) which is very stable numerically, while the usual atan(y,x) is more stable in the orange region. You can also convince yourself that this relationship:
atan(x,y) = PI/2 - atan(y,x)
holds for all non-origin (x,y), where it is undefined, and we are talking about atan(y,x) that is able to return angle value in the entire range of -PI,PI, not atan(y_over_x) which only returns angle between -PI/2, PI/2. Therefore, our robust atan2() routine for GLSL is quite simple:
float atan2(in float y, in float x)
{
bool s = (abs(x) > abs(y));
return mix(PI/2.0 - atan(x,y), atan(y,x), s);
}
As a side note, the identity for mathematical function atan(x) is actually:
atan(x) + atan(1/x) = sgn(x) * PI/2
which is true because its range is (-PI/2, PI/2).
Depending on your targeted platform, this might be a solved problem. The OpenGL spec for atan(y, x) specifies that it should work in all quadrants, leaving behavior undefined only when x and y are both 0.
So one would expect any decent implementation to be stable near all axes, as this is the whole purpose behind 2-argument atan (or atan2).
The questioner/answerer is correct in that some implementations do take shortcuts. However, the accepted solution makes the assumption that a bad implementation will always be unstable when x is near zero: on some hardware (my Galaxy S4 for example) the value is stable when x is near zero, but unstable when y is near zero.
To test your GLSL renderer's implementation of atan(y,x), here's a WebGL test pattern. Follow the link below and as long as your OpenGL implementation is decent, you should see something like this:
Test pattern using native atan(y,x): http://glslsandbox.com/e#26563.2
If all is well, you should see 8 distinct colors (ignoring the center).
The linked demo samples atan(y,x) for several values of x and y, including 0, very large, and very small values. The central box is atan(0.,0.)--undefined mathematically, and implementations vary. I've seen 0 (red), PI/2 (green), and NaN (black) on hardware I've tested.
Here's a test page for the accepted solution. Note: the host's WebGL version lacks mix(float,float,bool), so I added an implementation that matches the spec.
Test pattern using atan2(y,x) from accepted answer: http://glslsandbox.com/e#26666.0
Your proposed solution still fails in the case x=y=0. Here both of the atan() functions return NaN.
Further I would not rely on mix to switch between the two cases. I am not sure how this is implemented/compiled, but IEEE float rules for x*NaN and x+NaN result again in NaN. So if your compiler really used mix/interpolation the result should be NaN for x=0 or y=0.
Here is another fix which solved the problem for me:
float atan2(in float y, in float x)
{
return x == 0.0 ? sign(y)*PI/2 : atan(y, x);
}
When x=0 the angle can be ±π/2. Which of the two depends on y only. If y=0 too, the angle can be arbitrary (vector has length 0). sign(y) returns 0 in that case which is just ok.
Sometimes the best way to improve the performance of a piece of code is to avoid calling it in the first place. For example, one of the reasons you might want to determine the angle of a vector is so that you can use this angle to construct a rotation matrix using combinations of the angle's sine and cosine. However, the sine and cosine of a vector (relative to the origin) are already hidden in plain sight inside the vector itself. All you need to do is to create a normalized version of the vector by dividing each vector coordinate by the total length of the vector. Here's the two-dimensional example to calculate the sine and cosine of the angle of vector [ x y ]:
double length = sqrt(x*x + y*y);
double cos = x / length;
double sin = y / length;
Once you have the sine and cosine values, you can now directly populate a rotation matrix with these values to perform a clockwise or counterclockwise rotation of arbitrary vectors by the same angle, or you can concatenate a second rotation matrix to rotate to an angle other than zero. In this case, you can think of the rotation matrix as "normalizing" the angle to zero for an arbitrary vector. This approach is extensible to the three-dimensional (or N-dimensional) case as well, although for example you will have three angles and six sin/cos pairs to calculate (one angle per plane) for 3D rotation.
In situations where you can use this approach, you get a big win by bypassing the atan calculation completely, which is possible since the only reason you wanted to determine the angle was to calculate the sine and cosine values. By skipping the conversion to angle space and back, you not only avoid worrying about division by zero, but you also improve precision for angles which are near the poles and would otherwise suffer from being multiplied/divided by large numbers. I've successfully used this approach in a GLSL program which rotates a scene to zero degrees to simplify a computation.
It can be easy to get so caught up in an immediate problem that you can lose sight of why you need this information in the first place. Not that this works in every case, but sometimes it helps to think out of the box...
A formula that gives an angle in the four quadrants for any value
of coordinates x and y. For x=y=0 the result is undefined.
f(x,y)=pi()-pi()/2*(1+sign(x))* (1-sign(y^2))-pi()/4*(2+sign(x))*sign(y)
-sign(x*y)*atan((abs(x)-abs(y))/(abs(x)+abs(y)))

Physically-based fracture simulation with opengl/c++

I am trying to implement the ideas in this paper for modeling fracture:
http://graphics.berkeley.edu/papers/Obrien-GMA-1999-08/index.html
I am stuck at a point (essentially page 4...) and would really appreciate any help. The part I am stuck on involves the deformation of tetrahedron (using FEM).
I have a single tetrahedron defined by four nodes (each node has a x, y, z position) in which I calculate the following matrices from:
u: each column is a vector containing material coordinates (x, y, z,
1) for each node (so total 4 columns), a 4x4 matrix
B: inverse(u), he calls this the basis matrix, a 4x4 matrix
P: each column is a vector containing real world coordinates (x, y,
z) for each node, I set P is initially equal to u since the object is
not deformed at the rest state, a 3x4 matrix
V: give some initial velocities for (x, y, z) in each node, so a 3x4
matrix
delta: basically an identity matrix, {{1, 0, 0}, {0, 1, 0}, {0, 0,
1}, {0, 0, 0}}
I get x(u) = P*B*u and v(u) = V*B*u, but not sure where to use these...
Also, I get dx = P*B*delta and dv = V*B*delta
I then get strain by Green's strain tensor, epsilon = 1/2(dx+transpose(dx)) - Identity_3x3
And then stress, sigma = lambda*trace(epsilon)*Identity_3x3 + 2*mu*epsilon
I get the elastic force by equation (24) on page 4 of the paper. It's just a big summation.
I then using explicit integration to update real world coordinates P. The idea is that the velocity update involves the force on the node of the tetrahedron and therefore affects the real-world coordinate position, making the object deform.
The problem, however, is that the force is incredibly small...something x 10^-19, etc. So, c++ usually rounds to 0. I've stepped through the calculations and can't figure out why.
I know I'm missing something here, just can't figure out what. What update am I not doing correctly?
A common reason why the force is small is that your Young's modulus (lambda) is too small. If you are using a scale of meters, a macro scale object might have 10^5 young's modlus and a .3 to .4 Poisson's ratio.
It sounds like what might be happening is that your tet is still in the rest configuration. In the presence of no deformation, the strain will be zero and so in-turn the stress and force will also be about zero. You can perturb the vertices in various ways and make sure your strain (epsilon) is being computed correctly. One simple test is to scale by 2 about the centroid which should give you a positive strain. If you scale by .5 about the centroid you will get a negative strain. If you translate the vertices uniformly you will get no change in strain (a common FEM invariant). If you rotate them you probably will get a change, but a co-rotational constitutive model wouldn't.
Note you might think that gravity would cause deformation, but unless one of the vertices is constrained, the uniform force on all vertices will cause a uniform translation which will not change the strain from being zero.
You definitely should not need to use arbitrary precision arithmetic for the examples in the paper. In fact, floats typically are sufficient for these types of simulation.
I might be mistaken, but c++ doubles only go to 15 decimal places, (at least that's what my std::numeric_limits says). So you're way out of precision.
So you might end up needing a library for arbitrary precision arithmetics, e.g., http://gmplib.org/