Finding extrinsics between cameras - c++

I'm in the situation where I need to find the relative camera poses between two/or more cameras based on image correspondences (so the cameras are not in the same point). To solve this I tried the same approach as described here (code below).
cv::Mat calibration_1 = ...;
cv::Mat calibration_2 = ...;
cv::Mat calibration_target = calibration_1;
calibration_target.at<float>(0, 2) = 0.5f * frame_width; // principal point
calibration_target.at<float>(1, 2) = 0.5f * frame_height; // principal point
auto fundamental_matrix = cv::findFundamentalMat(left_matches, right_matches, CV_RANSAC);
fundamental_matrix.convertTo(fundamental_matrix, CV_32F);
cv::Mat essential_matrix = calibration_2.t() * fundamental_matrix * calibration_1;
cv::SVD svd(essential_matrix);
cv::Matx33f w(0,-1,0,
1,0,0,
0,0,1);
cv::Matx33f w_inv(0,1,0,
-1,0,0,
0,0,1);
cv::Mat rotation_between_cameras = svd.u * cv::Mat(w) * svd.vt; //HZ 9.19
But in most of my cases I get extremly weird results. So my next thought was using a full fledged bundle adjuster (which should do what i am looking for?!). Currently my only big dependency is OpenCV and they only have a undocumented bundle adjustment implementation.
So the question is:
Is there a bundle adjuster which has no dependencies and uses a licence which allows commerical use?
Are there other easy way to find the extrinsics?
Are objects with very different distances to the cameras a problem? (heavy parallax)
Thanks in advance

I'm also working on same problem and facing slimier issues.
Here are some suggestions -
Modify Essential Matrix Before Decomposition:
Modify Essential matrix before decomposition [U W Vt] = SVD(E), and new E' = diag(s,s,0) where s = W(0,0) + W(1,1) / 2
2-Stage Fundamental Matrix Estimation:
Recalculate the fundamental matrix with the RANSAC inliers
These steps should make the Rotation estimation more susceptible to noise.

you have to get 4 different solutions and select the one with the most # points having positive Z coordinates. The solution are generated by inverting the sign of the fundamental matrix an substituting w with w_inv which you did not do though you calculated w_inv. Are you reusing somebody else code?

Related

Determine minimum parallax for correct triangulation of 3D points in OpenCV

I am triangulating 3D points using OpenCV triangulation function for monocular sequence that sometimes works fine but I have noticed when two camera poses are close to each other then the triangulated points are far away. I can understand the issue that is since the camera poses are close then the ray intersection from two cameras is being take place far away from the camera. That is why it creates the 3D points far away. I have also noticed that the distance requirement between two cameras for correct triangulation varies in different cases.Currently I am trying to find parallax between two pose and if that is above a certain threshold(I have chosen 27) then proceed to triangulate but I does not look correct for all the cases.
My code for calculating parallax as following-
float checkAvgParallex(SE3& prevPose, SE3& currPose, std::vector<Point2f>& prevPoints, std::vector<Point2f>& currPoints, Mat& K) {
Eigen::Matrix3d relRot = Eigen::Matrix3d::Identity();
Eigen::Matrix3d prevRot = prevPose.rotationMatrix();
Eigen::Matrix3d currRot = currPose.rotationMatrix();
relRot = prevRot * currRot;
float avg_parallax = 0.;
int nbparallax = 0;
std::set<float> set_parallax;
bearingVectors_t prevBVs;
bearingVectors_t currBVs;
points2bearings(prevPoints, K, prevBVs);
points2bearings(currPoints, K, currBVs);
for (int i = 0; i < prevPoints.size(); i++) {
Point2f unpx = projectCamToImage(relRot * currBVs[i], K);
float parallax = cv::norm(unpx - prevPoints[i]);
avg_parallax += parallax;
nbparallax++;
set_parallax.insert(parallax);
}
if (nbparallax == 0)
return 0.0;
avg_parallax /= nbparallax;
auto it = set_parallax.begin();
std::advance(it, set_parallax.size() / 2);
avg_parallax = *it;
return avg_parallax;
}
And sometime when parallax between camera does not exceed 27 so, triangulation won't work, due to this my further pose calculation in SLAM system stops due to lack of 3D points.
So can anyone suggest me alternative strategy using which I can estimate correct 3D points and my SLAM system wont suffer due to lack of 3D points, please?

unrolled label of a Cap using chessboard pattern (OpenCV c++)

I´m trying to use a chessboard pattern, to get the information of the cylinder map and rectifie the "distortion" so that image shows the cap surface unrolled. I made a first test with a one shot calibration and cv::fisheye::undistortImage to get a un-distortion (attached two images).
*//runCalibrationFishEye
void runCalibrationFishEye(cv::Mat& image, cv::Matx33d&, cv::Vec4d&);
cv::Mat removeFisheyeLensDist(cv::Mat&, cv::Matx33d&, cv::Vec4d&);*
It is to remark that i am not interested in calibrate the image, to get metric values. I just want to use the chessboard information to unroll the image on the cylinder surface.
The final aim is to use the rectified images of 4 cameras and to stitch the rectified images to one unrolled image.
Do i need to make a full calibration of the camera? Or is there another way to get a remap of the cylinder surface?
I will try to implement this interesting unwarp method: https://dsp.stackexchange.com/questions/2406/how-to-flatten-the-image-of-a-label-on-a-food-jar/2409#2409
cap with chessboard
Rectification
I have found a similar approach, of another problem but with a similar Mathematics. And it was solved without a calibration pattern. Link here. Its a approximation, but the result is quite good enough.
the user Hammer gave an answer that helped me to get a solution. I have changed the way he do the mapping, using OpenCV remap. The formula to recalculate the coordinates is just as he gave it, using different values, and making a preprocessing to adjust the image (Rotation, zoom, and other adjustments).Unrolled image. I am now improving the distortion of the edges, so that it is not so pronounced at the edges. But the main question is solved.
cv::Point2f convert_pt(cv::Point2f point, int w, int h)
{
cv::Point2f pc(point.x - w / 2, point.y - h / 2);
float f = w;
float r = w;
float omega = w / 2;
float z0 = f - sqrt(r*r - omega*omega);
//Formula para remapear el cylindro
float zc = (2 * z0 + sqrt(4 * z0*z0 - 4 * (pc.x*pc.x / (f*f) + 1)*(z0*z0 - r*r))) / (2 * (pc.x*pc.x / (f*f) + 1));
cv::Point2f final_point(pc.x*zc / f, pc.y*zc / f);
final_point.x += w / 2;
final_point.y += h / 2;
return final_point;
}

Creating a steepest descent algorithm to determine a 3D affine warp

I want to implement the 2.5D inverse compositional image alignment. For that I need to create an steepest descent image. I followed the implementation from Code Project for a 2D image alignment. But I am searching for 3D warp information and because of that also for a 3D steepest descent image.
To my project, I have a 3D model interpretation, with raycasting I am creating a rgbd-image. Now I want to search for a 3D warp, which aligns this template image with a given live image to estimate the camera position.
I have currently only the gradients in X and Y direction
cv::Sobel(grayImg_T, Grad_TX, CV_32F, 1, 0, 3);
cv::Sobel(grayImg_T, Grad_TY, CV_32F, 0, 1, 3);
And I am estimating the steepest descent as follows:
float* p_sd_pixel = &p_sd[cols*j * 3 + i * 3];
p_sd_pixel[0] = (float) (-cols*Tx + rows*Ty);
p_sd_pixel[1] = (float) Tx;
p_sd_pixel[2] = (float) Ty;
for(int l = 0; l < 3; l++){
for(int m = 0; m < 3; m++){
float* p_h = (float*)(H.data);
p_h[3*l+m] += p_sd_pixel[l]*p_sd_pixel[m];
}
}
Both is from the 2D inverse compositional image alignment code, I have from the website of the link I posted before. I think I need also a gradient in Z direction. But I have no idea how to create the steepest descent image for 2.5D alignment and also how to determine the affine warp. How can I tackle the math or find a better way to implement this?

How to correctly use cv::triangulatePoints()

I am trying to triangulate some points with OpenCV and I found this cv::triangulatePoints() function. The problem is that there is almost no documentation or examples of it.
I have some doubts about it.
What method does it use?
I've making a small research about triangulations and there are several methods (Linear, Linear LS, eigen, iterative LS, iterative eigen,...) but I can't find which one is it using in OpenCV.
How should I use it? It seems that as an input it needs a projection matrix and 3xN homogeneous 2D points. I have them defined as std::vector<cv::Point3d> pnts, but as an output it needs 4xN arrays and obviously I can't create a std::vector<cv::Point4d> because it doesn't exist, so how should I define the output vector?
For the second question I tried: cv::Mat pnts3D(4,N,CV_64F); and cv::Mat pnts3d;, neither seems to work (it throws an exception).
1.- The method used is Least Squares. There are more complex algorithms than this one. Still it is the most common one, as the other methods may fail in some cases (i.e. some others fails if points are on plane or on infinite).
The method can be found in Multiple View Geometry in Computer Vision by Richard Hartley and Andrew Zisserman (p312)
2.-The usage:
cv::Mat pnts3D(1,N,CV_64FC4);
cv::Mat cam0pnts(1,N,CV_64FC2);
cv::Mat cam1pnts(1,N,CV_64FC2);
Fill the 2 chanel point Matrices with the points in images.
cam0 and cam1 are Mat3x4 camera matrices (intrinsic and extrinsic parameters). You can construct them by multiplying A*RT, where A is the intrinsic parameter matrix and RT the rotation translation 3x4 pose matrix.
cv::triangulatePoints(cam0,cam1,cam0pnts,cam1pnts,pnts3D);
NOTE: pnts3D NEEDs to be a 4 channel 1xN cv::Mat when defined, throws exception if not, but the result is a cv::Mat(4,N,cv_64FC1) matrix. Really confusing, but it is the only way I didn't got an exception.
UPDATE: As of version 3.0 or possibly earlier, this is no longer true, and pnts3D can also be of type Mat(4,N,CV_64FC1) or may be left completely empty (as usual, it is created inside the function).
A small addition to #Ander Biguri's answer. You should get your image points on a non-undistorted image, and invoke undistortPoints() on the cam0pnts and cam1pnts, because cv::triangulatePoints expects the 2D points in normalized coordinates (independent from the camera) and cam0 and cam1 should be only [R|t^T] matricies you do not need to multiple it with A.
Thanks to Ander Biguri! His answer helped me a lot. But I always prefer the alternative with std::vector, I edited his solution to this:
std::vector<cv::Point2d> cam0pnts;
std::vector<cv::Point2d> cam1pnts;
// You fill them, both with the same size...
// You can pick any of the following 2 (your choice)
// cv::Mat pnts3D(1,cam0pnts.size(),CV_64FC4);
cv::Mat pnts3D(4,cam0pnts.size(),CV_64F);
cv::triangulatePoints(cam0,cam1,cam0pnts,cam1pnts,pnts3D);
So you just need to do emplace_back in the points. Main advantage: you do not need to know the size N before start filling them. Unfortunately, there is no cv::Point4f, so pnts3D must be a cv::Mat...
I tried cv::triangulatePoints, but somehow it calculates garbage. I was forced to implement a linear triangulation method manually, which returns a 4x1 matrix for the triangulated 3D point:
Mat triangulate_Linear_LS(Mat mat_P_l, Mat mat_P_r, Mat warped_back_l, Mat warped_back_r)
{
Mat A(4,3,CV_64FC1), b(4,1,CV_64FC1), X(3,1,CV_64FC1), X_homogeneous(4,1,CV_64FC1), W(1,1,CV_64FC1);
W.at<double>(0,0) = 1.0;
A.at<double>(0,0) = (warped_back_l.at<double>(0,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,0) - mat_P_l.at<double>(0,0);
A.at<double>(0,1) = (warped_back_l.at<double>(0,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,1) - mat_P_l.at<double>(0,1);
A.at<double>(0,2) = (warped_back_l.at<double>(0,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,2) - mat_P_l.at<double>(0,2);
A.at<double>(1,0) = (warped_back_l.at<double>(1,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,0) - mat_P_l.at<double>(1,0);
A.at<double>(1,1) = (warped_back_l.at<double>(1,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,1) - mat_P_l.at<double>(1,1);
A.at<double>(1,2) = (warped_back_l.at<double>(1,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,2) - mat_P_l.at<double>(1,2);
A.at<double>(2,0) = (warped_back_r.at<double>(0,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,0) - mat_P_r.at<double>(0,0);
A.at<double>(2,1) = (warped_back_r.at<double>(0,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,1) - mat_P_r.at<double>(0,1);
A.at<double>(2,2) = (warped_back_r.at<double>(0,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,2) - mat_P_r.at<double>(0,2);
A.at<double>(3,0) = (warped_back_r.at<double>(1,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,0) - mat_P_r.at<double>(1,0);
A.at<double>(3,1) = (warped_back_r.at<double>(1,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,1) - mat_P_r.at<double>(1,1);
A.at<double>(3,2) = (warped_back_r.at<double>(1,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,2) - mat_P_r.at<double>(1,2);
b.at<double>(0,0) = -((warped_back_l.at<double>(0,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,3) - mat_P_l.at<double>(0,3));
b.at<double>(1,0) = -((warped_back_l.at<double>(1,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,3) - mat_P_l.at<double>(1,3));
b.at<double>(2,0) = -((warped_back_r.at<double>(0,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,3) - mat_P_r.at<double>(0,3));
b.at<double>(3,0) = -((warped_back_r.at<double>(1,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,3) - mat_P_r.at<double>(1,3));
solve(A,b,X,DECOMP_SVD);
vconcat(X,W,X_homogeneous);
return X_homogeneous;
}
the input parameters are two 3x4 camera projection matrices and a corresponding left/right pixel pair (x,y,w).
Additionally to Ginés Hidalgo comments,
if you did a stereocalibration and could estimate exactly Fundamental Matrix from there, which was calculated based on checkerboard.
Use correctMatches function refine detected keypoints
std::vector<cv::Point2f> pt_set1_pt_c, pt_set2_pt_c;
cv::correctMatches(F,pt_set1_pt,pt_set2_pt,pt_set1_pt_c,pt_set2_pt_c)

Two 3D point cloud transformation matrix

I'm trying to guess wich is the rigid transformation matrix between two 3D points clouds.
The two points clouds are those ones:
keypoints from the kinect (kinect_keypoints).
keypoints from a 3D object (box) (object_keypoints).
I have tried two options:
[1]. Implementation of the algorithm to find rigid transformation.
**1.Calculate the centroid of each point cloud.**
**2.Center the points according to the centroid.**
**3. Calculate the covariance matrix**
cvSVD( &_H, _W, _U, _V, CV_SVD_U_T );
cvMatMul( _V,_U, &_R );
**4. Calculate the rotartion matrix using the SVD descomposition of the covariance matrix**
float _Tsrc[16] = { 1.f,0.f,0.f,0.f,
0.f,1.f,0.f,0.f,
0.f,0.f,1.f,0.f,
-_gc_src.x,-_gc_src.y,-_gc_src.z,1.f }; // 1: src points to the origin
float _S[16] = { _scale,0.f,0.f,0.f,
0.f,_scale,0.f,0.f,
0.f,0.f,_scale,0.f,
0.f,0.f,0.f,1.f }; // 2: scale the src points
float _R_src_to_dst[16] = { _Rdata[0],_Rdata[3],_Rdata[6],0.f,
_Rdata[1],_Rdata[4],_Rdata[7],0.f,
_Rdata[2],_Rdata[5],_Rdata[8],0.f,
0.f,0.f,0.f,1.f }; // 3: rotate the scr points
float _Tdst[16] = { 1.f,0.f,0.f,0.f,
0.f,1.f,0.f,0.f,
0.f,0.f,1.f,0.f,
_gc_dst.x,_gc_dst.y,_gc_dst.z,1.f }; // 4: from scr to dst
// _Tdst * _R_src_to_dst * _S * _Tsrc
mul_transform_mat( _S, _Tsrc, Rt );
mul_transform_mat( _R_src_to_dst, Rt, Rt );
mul_transform_mat( _Tdst, Rt, Rt );
[2]. Use estimateAffine3D from opencv.
float _poseTrans[12];
std::vector<cv::Point3f> first, second;
cv::Mat aff(3,4,CV_64F, _poseTrans);
std::vector<cv::Point3f> first, second; (first-->kineckt_keypoints and second-->object_keypoints)
cv::estimateAffine3D( first, second, aff, inliers );
float _poseTrans2[16];
for (int i=0; i<12; ++i)
{
_poseTrans2[i] = _poseTrans[i];
}
_poseTrans2[12] = 0.f;
_poseTrans2[13] = 0.f;
_poseTrans2[14] = 0.f;
_poseTrans2[15] = 1.f;
The problem in the first one is that the transformation it is not correct and in the second one, if a multiply the kinect point cloud with the resultant matrix, some values are infinite.
Is there any solution from any of these options? Or an alternative one, apart from the PCL?
Thank you in advance.
EDIT: This is an old post, but an answer might be useful to someone ...
Your first approach can work in very specific cases (ellipsoid point clouds or very elongated shapes), but is not appropriate for point clouds acquired by the kinect. And about your second approach, I am not familiar with OpenCV function estimateAffine3D but I suspect it assumes the two input point clouds correspond to the same physical points, which is not the case if you used a kinect point cloud (which contain noisy measurements) and points from an ideal 3D model (which are perfect).
You mentioned that you are aware of the Point Cloud Library (PCL) and do not want to use it. If possible, I think you might want to reconsider this, because PCL is much more appropriate than OpenCV for what you want to do (check the tutorial list, one of them covers exactly what you want to do: Aligning object templates to a point cloud).
However, here are some alternative solutions to your problem:
If your two point clouds correspond exactly to the same physical points, your second approach should work, but you can also check out Absolute Orientation (e.g. Matlab implementation)
If your two point clouds do not correspond to the same physical points, you actually want to register (or align) them and you can use either:
one of the many variants of the Iterative Closest Point (ICP) algorithm, if you know approximately the position of your object. Wikipedia Entry
3D feature points such as 3D SIFT, 3D SURF or NARF feature points, if you have no clue about your object's position.
Again, all these approaches are already implemented in PCL.