OpenCV undistortPoints and triangulatePoint give odd results (stereo) - c++

I'm trying to get 3D coordinates of several points in space, but I'm getting odd results from both undistortPoints() and triangulatePoints().
Since both cameras have different resolution, I've calibrated them separately, got RMS errors of 0,34 and 0,43, then used stereoCalibrate() to get more matrices, got an RMS of 0,708, and then used stereoRectify() to get remaining matrices. With that in hand I've started the work on gathered coordinates, but I get weird results.
For example, input is: (935, 262), and the undistortPoints() output is (1228.709125, 342.79841) for one point, while for another it's (934, 176) and (1227.9016, 292.4686) respectively. Which is weird, because both of these points are very close to the middle of the frame, where distortions are the smallest. I didn't expect it to move them by 300 pixels.
When passed to traingulatePoints(), the results get even stranger - I've measured the distance between three points in real life (with a ruler), and calculated the distance between pixels on each picture. Because this time the points were on a pretty flat plane, these two lengths (pixel and real) matched, as in |AB|/|BC| in both cases was around 4/9. However, triangulatePoints() gives me results off the rails, with |AB|/|BC| being 3/2 or 4/2.
This is my code:
double pointsBok[2] = { bokList[j].toFloat()+xBok/2, bokList[j+1].toFloat()+yBok/2 };
cv::Mat imgPointsBokProper = cv::Mat(1,1, CV_64FC2, pointsBok);
double pointsTyl[2] = { tylList[j].toFloat()+xTyl/2, tylList[j+1].toFloat()+yTyl/2 };
//cv::Mat imgPointsTyl = cv::Mat(2,1, CV_64FC1, pointsTyl);
cv::Mat imgPointsTylProper = cv::Mat(1,1, CV_64FC2, pointsTyl);
cv::undistortPoints(imgPointsBokProper, imgPointsBokProper,
intrinsicOne, distCoeffsOne, R1, P1);
cv::undistortPoints(imgPointsTylProper, imgPointsTylProper,
intrinsicTwo, distCoeffsTwo, R2, P2);
cv::triangulatePoints(P1, P2, imgWutBok, imgWutTyl, point4D);
double wResult = point4D.at<double>(3,0);
double realX = point4D.at<double>(0,0)/wResult;
double realY = point4D.at<double>(1,0)/wResult;
double realZ = point4D.at<double>(2,0)/wResult;
The angles between points are kinda sorta good but usually not:
`7,16816 168,389 4,44275` vs `5,85232 170,422 3,72561` (degrees)
`8,44743 166,835 4,71715` vs `12,4064 158,132 9,46158`
`9,34182 165,388 5,26994` vs `19,0785 150,883 10,0389`
I've tried to use undistort() on the entire frame, but got results just as odd. The distance between B and C points should be pretty much unchanged at all times, and yet this is what I get:
7502,42
4876,46
3230,13
2740,67
2239,95
Frame by frame.
Pixel distance (bottom) vs real distance (top) - should be very similar:
Angle:
Also, shouldn't both undistortPoints() and undistort() give the same results (another set of videos here)?

The function cv::undistort does undistortion and reprojection in one go. It performs the following list of operations:
undo camera projection (multiplication with the inverse of the camera matrix)
apply the distortion model to undo the distortion
rotate by the provided Rotation matrix R1/R2
project points to image using the provided Projection matrix P1/P2
If you pass the matrices R1, P1 resp. R2, P2 from cv::stereoCalibrate(), the input points will be undistorted and rectified. Rectification means that the images are transformed in a way such that corresponding points have the same y-coordinate. There is no unique solution for image rectification, as you can apply any translation or scaling to both images, without changing the alignement of corresponding points.
That being said, cv::stereoCalibrate() can shift the center of projection quite a bit (e.g. 300 pixels). If you want pure undistortion you can pass an Identity Matrix (instead of R1) and the original camera Matrix K (instead of P1). This should lead to pixel coordinates similar to the original ones.

Related

Given camera matrices, how to find point correspondances using OpenCV?

I'm following this tutorial, which uses Features2D + Homography. If I have known camera matrix for each image, how can I optimize the result? I tried some images, but it didn't work well.
//Edit
After reading some materials, I think I should rectify two image first. But the rectification is not perfect, so a vertical line on image 1 correspond a vertical band on image 2 generally. Are there any good algorithms?
I'm not sure if I understand your problem. You want to find corresponding points between the images or you want to improve the correctness of your matches by use of the camera intrinsics?
In principle, in order to use camera geometry for finding matches, you would need the fundamental or essential matrix, depending on wether you know the camera intrinsics (i.e. calibrated camera). That means, you would need an estimate for the relative rotation and translation of the camera. Then, by computing the epipolar lines corresponding to the features found in one image, you would need to search along those lines in the second image to find the best match. However, I think it would be better to simply rely on automatic feature matching. Given the fundamental/essential matrix, you could try your luck with correctMatches, which will move the correspondences such that the reprojection error is minimised.
Tips for better matches
To increase the stability and saliency of automatic matches, it usually pays to
Adjust the parameters of the feature detector
Try different detection algorithms
Perform a ratio test to filter out those keypoints which have a very similar second-best match and are therefore unstable. This is done like this:
Mat descriptors_1, descriptors_2; // obtained from feature detector
BFMatcher matcher;
vector<DMatch> matches;
matcher = BFMatcher(NORM_L2, false); // norm depends on feature detector
vector<vector<DMatch>> match_candidates;
const float ratio = 0.8; // or something
matcher.knnMatch(descriptors_1, descriptors_2, match_candidates, 2);
for (int i = 0; i < match_candidates.size(); i++)
{
if (match_candidates[i][0].distance < ratio * match_candidates[i][1].distance)
matches.push_back(match_candidates[i][0]);
}
A more involved way of filtering would be to compute the reprojection error for each keypoint in the first frame. This means to compute the corresponding epipolar line in the second image and then checking how far its supposed matching point is away from that line. Throwing away those points whose distance exceeds some threshold would remove the matches which are incompatible with the epiploar geometry (which I assume would be known). Computing the error can be done like this (I honestly do not remember where I took this code from and I may have modified it a bit, also the SO editor is buggy when code is inside lists, sorry for the bad formatting):
double computeReprojectionError(vector& imgpts1, vector& imgpts2, Mat& inlier_mask, const Mat& F)
{
double err = 0;
vector lines[2];
int npt = sum(inlier_mask)[0];
// strip outliers so validation is constrained to the correspondences
// which were used to estimate F
vector imgpts1_copy(npt),
imgpts2_copy(npt);
int c = 0;
for (int k = 0; k < inlier_mask.size().height; k++)
{
if (inlier_mask.at(0,k) == 1)
{
imgpts1_copy[c] = imgpts1[k];
imgpts2_copy[c] = imgpts2[k];
c++;
}
}
Mat imgpt[2] = { Mat(imgpts1_copy), Mat(imgpts2_copy) };
computeCorrespondEpilines(imgpt[0], 1, F, lines[0]);
computeCorrespondEpilines(imgpt1, 2, F, lines1);
for(int j = 0; j < npt; j++ )
{
// error is computed as the distance between a point u_l = (x,y) and the epipolar line of its corresponding point u_r in the second image plus the reverse, so errij = d(u_l, F^T * u_r) + d(u_r, F*u_l)
Point2f u_l = imgpts1_copy[j], // for the purpose of this function, we imagine imgpts1 to be the "left" image and imgpts2 the "right" one. Doesn't make a difference
u_r = imgpts2_copy[j];
float a2 = lines1[j][0], // epipolar line
b2 = lines1[j]1,
c2 = lines1[j][2];
float norm_factor2 = sqrt(pow(a2, 2) + pow(b2, 2));
float a1 = lines[0][j][0],
b1 = lines[0][j]1,
c1 = lines[0][j][2];
float norm_factor1 = sqrt(pow(a1, 2) + pow(b1, 2));
double errij =
fabs(u_l.x * a2 + u_l.y * b2 + c2) / norm_factor2 +
fabs(u_r.x * a1 + u_r.y * b1 + c1) / norm_factor1; // distance of (x,y) to line (a,b,c) = ax + by + c / (a^2 + b^2)
err += errij; // at this point, apply threshold and mark bad matches
}
return err / npt;
}
The point is, grab the fundamental matrix, use it to compute epilines for all the points and then compute the distance (the lines are given in a parametric form so you need to do some algebra to get the distance). This is somewhat similar in outcome to what findFundamentalMat with the RANSAC method does. It returns a mask wherein for each match there is either a 1, meaning that it was used to estimate the matrix, or a 0 if it was thrown out. But estimating the fundamental Matrix like this will probably be less accurate than using chessboards.
EDIT: Looks like oarfish beat me to it, but I'll leave this here.
The fundamental matrix (F) defines a mapping from a point in the left image to a line in the right image on which the corresponding point must lie, assuming perfect calibration. This is the epipolar line, i.e. the line though the point in the left image and the two epipoles of the stereo camera pair. For references, see these lecture notes and this chapter of the HZ book.
Given a set of point correspondences in the left and right images: (p_L, p_R), from SURF (or any other feature matcher), and given F, the constraint from epipolar geometry of the stereo pair says that p_R should lie on the epipolar line projected by p_L onto the right image, i.e.
In practice, calibration errors from noise as well as erroneous feature matches lead to a non-zero value.
However, using this idea, you can then perform outlier removal by rejecting those feature matches for which this equation is greater than a certain threshold value, i.e. reject (p_L, p_R) if and only if:
When selecting this threshold, keep in mind that it is the distance in image space of a point from an epipolar line that you are willing to tolerate, which in some sense is your epipolar error tolerance.
Degenerate case: To visually imagine what this means, let us assume that the stereo pair differ only in a pure X-translation. Then the epipolar lines are horizontal. This means that you can connect the feature matched point pairs by a line and reject those pairs whose line slope is not close to zero. The equation above is a generalization of this idea to arbitrary stereo rotation and translation, which is accounted for by the matrix F.
Your specific images: It looks like your feature matches are sparse. I suggest instead to use a dense feature matching approach so that after outlier removal, you are still left with a sufficient number of good-quality matches. I'm not sure which dense feature matcher is already implemented in OpenCV, but I suggest starting here.
Giving your pictures, your are trying to do a stereo matching.
This page will be helpfull. The rectification you want can be done using stereoCalibrate then stereoRectify.
The result (from the doc):
In order to find the Fundamental Matrix, you need correct correspondances, but in order to get good correspondances, you need a good estimate of the fundamental matrix. This might sound like an impossible chicken-and-the-egg-problem, but there is well established methods to do this; RANSAC.
It randomly selects a small set of correspondances, uses those to calculate a fundamental matrix (using the 7 or 8 point algorithm) and then tests how many of the other correspondences that comply with this matrix (using the method described by scribbleink for measuring the distance between point and epipolar line). It keeps testing new combinations of correspondances for a certain number of iterations and selects the one with the most inliers.
This is already implemented in OpenCV as cv::findFundamentalMat (http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#findfundamentalmat). Select the method CV_FM_RANSAC to use ransac to remove bad correspondances. It will output a list of all the inlier correspondances.
The requirement for this is that all the points does not lie on the same plane.

Compare intensity pixel value Vec3b in OpenCV

I have a 3 channel Mat image, type is CV_8UC3.
I want to compare, in a loop, the intensity value of a pixel with its neighbours and then set 0 or 1 if the neighbour is greater or not.
I can get the intensity calling Img.at<Vec3b>(x,y).
But my question is: how can I compare two Vec3b?
Should I compare pixels value for every channel (BGR or Vec3b[0], Vec3b[1] and Vec3b[2]), and then merge the three channels results into a single Mat object?
Me again :)
If you want to compare (greater or less) two RGB values you need to project the 3-dimensional RGB space onto a plane or axis.
Of course, there are many possibilities to do this, but an easy way would be to use the HSV color space. The hue (H), however, is not appropriate as a linear order function because it is circular (i.e. the value 1.0 is identical with 0.0, so you cannot decide if 0.5 > 0.0 or 0.5 < 0.0). However, the saturation (S) or the value (V) are appropriate projection functions for your purpose:
If you want to have colored pixels "larger" than monochrome pixels, you will prefer S.
If you want to have lighter pixels larger than darker pixels, you will probably prefer V.
Also any combination of S and V would be a valid projection function, e.g. S+V.
As far as I understand, you want a measure to calculate distance/similarity between two Vec3b pixels. This can be reflected to the general problem of finding distance between two vectors in an n-mathematical space.
One of the famous measures (and I think this is what you're asking for), is the Euclidean distance.
If you are using Opencv then you can simply use:
cv::Vec3b a(1, 1, 1);
cv::Vec3b b(5, 5, 5);
double dist = cv::norm(a, b, CV_L2);
You can refer to this for reading about cv::norm and its options.
Edit: If you are doing this to measure color similarity, it's recommended to use the LAB color space as it's proved that Euclidean distance in LAB space is a good approximation for human perception of colors.
Edit 2: I see what you mean, for this you can get the magnitude of each vector and then compare them, something like this:
double a_magnitude = cv::norm(a, CV_L2);
double b_magnitude = cv::norm(b, CV_L2);
if(a_magnitude > b_magnitude)
// do something
else
// do something else.

Ray tracing texture implementation for spheres

I'm trying to implement textures for spheres in my ray tracer. I managed to get something working, but I am unsure about its correctness. Below is the code for getting the texture coordinates. For now, the texture is random and is generated at runtime.
virtual void GetTextureCoord(Vect hitPoint, int hres, int vres, int& x, int& y) {
float theta = acos(hitPoint.getVectY());
float phi = atan2(hitPoint.getVectX(), hitPoint.getVectZ());
if (phi < 0.0) {
phi += TWO_PI;
}
float u = phi * INV_TWO_PI;
float v = 1 - theta * INV_PI;
y = (int) ((hres - 1) * u);
x = (int) ((vres - 1) * v);
}
This is how the spheres look now:
I had to normalize the coordinates of the hit point to get the spheres to look like that. Otherwise they would look like:
Was normalising the hit point coordinates the right approach, or is something else broken in my code? Thank you!
Instead of normalising the hit point, I tried translating it to the world origin (as if the sphere center was there) and obtained the following result:
I'm using a 256x256 resolution texture by the way.
It's unclear what you mean by "normalizing" the hit point since there's nothing that normalizes it in the code you posted, but you mentioned that your hit point is in world space.
Also, you didn't say what texture mapping you're trying to implement, but I assume you want your U and V texture coordinates to represent latitude and longitude on the sphere's surface.
Your first problem is that converting Cartesian to spherical coordinates requires that the sphere is centered at the origin in the Cartesian space, which isn't true in world space. If the hit point is in world space, you have to subtract the sphere's world-space center point to get the effective hit point in local coordinates. (You figured this part out already and updated the question with a new image.)
Your second problem is that the way you're calculating theta requires that the the sphere have a radius of 1, which isn't true even after you move the sphere's center to the origin. Remember your trigonometry: the argument to acos is the ratio of a triangle's side to its hypotenuse, and is always in the range (-1, +1). In this case your Y-coordinate is the side, and the sphere's radius is the hypotenuse. So you have to divide by the sphere's radius when calling acos. It's also a good idea to clamp the value to the (-1, +1) range in case floating-point rounding error puts it slightly outside.
(In principle you'd also have to divide the X and Z coordinates by the radius, but you're only using those for an inverse tangent, and dividing them both by the radius won't change their quotient and thus won't change phi.)
Right now your sphere intersection and texture-coordinate functions are operating in world space, but you'll probably find it useful later to implement transformation matrices, which let you transform things from one coordinate space to another. Then you can change your sphere functions to operate in a local coordinate space where the center is the origin and the radius is 1, and give each object an associated transformation matrix that maps the local coordinate space to the world coordinate space. This will simplify your ray/sphere intersection code, and let you remove the origin subtraction and radius division from GetTextureCoord (since they're always (0, 0, 0) and 1 respectively).
To intersect a ray with an object, you'd use the object's transformation matrix to transform the ray into the object's local coordinate space, do the intersection (and compute texture coordinates) there, and then transform the result (e.g. hit point and surface normal) back to world space.

transforming projection matrices computed from trifocal tensor to estimate 3D points

I am using this legacy code: http://fossies.org/dox/opencv-2.4.8/trifocal_8cpp_source.html
for estimating 3D points from the given corresponding 2D points from 3 different views. The problem I faced is same as stated here: http://opencv-users.1802565.n2.nabble.com/trifocal-tensor-icvComputeProjectMatrices6Points-icvComputeProjectMatricesNPoints-td2423108.html
I could compute Projection matrices successfully using icvComputeProjectMatrices6Points. I used 6 set of corresponding points from 3 views. Results are shown below:
projMatr1 P1 =
[-0.22742541, 0.054754492, 0.30500898, -0.60233182;
-0.14346679, 0.034095913, 0.33134204, -0.59825808;
-4.4949986e-05, 9.9166318e-06, 7.106331e-05, -0.00014547621]
projMatr2 P2 =
[-0.17060626, -0.0076031247, 0.42357284, -0.7917347;
-0.028817834, -0.0015948272, 0.2217239, -0.33850163;
-3.3046148e-05, -1.3680664e-06, 0.0001002633, -0.00019192585]
projMatr3 P3 =
[-0.033748217, 0.099119112, -0.4576003, 0.75215244;
-0.001807699, 0.0035084449, -0.24180284, 0.39423448;
-1.1765103e-05, 2.9554356e-05, -0.00013438619, 0.00025332544]
Furthermore, I computed 3D points using icvReconstructPointsFor3View. The six 3D points are as following:
4D points =
[-0.4999997, -0.26867214, -1, 2.88633e-07, 1.7766099e-07, -1.1447386e-07;
-0.49999994, -0.28693244, 3.2249036e-06, 1, 7.5971762e-08, 2.1956141e-07;
-0.50000024, -0.72402155, 1.6873783e-07, -6.8603946e-08, -1, 5.8393886e-07;
-0.50000012, -0.56681377, 1.202426e-07, -4.1603233e-08, -2.3659911e-07, 1]
While, actual 3D points are as following:
- { ID:1,X:500.000000, Y:800.000000, Z:3000.000000}
- { ID:2,X:500.000000, Y:800.000000, Z:4000.000000}
- { ID:3,X:1500.000000, Y:800.000000, Z:4000.000000}
- { ID:4,X:1500.000000, Y:800.000000, Z:3000.000000}
- { ID:5,X:500.000000, Y:1800.000000, Z:3000.000000}
- { ID:6,X:500.000000, Y:1800.000000, Z:4000.000000}
My question is now, how to transform P1, P2 and P3 to a form that allows
a meaningful triangulation? I need to compute the correct 3D points using trifocal tensor.
The trifocal tensor won't help you, because like the fundamental matrix, it only enables projective reconstruction of the scene and camera poses. If X0_j and P0_i are the true 3D points and camera matrices, this means that the reconstructed points Xp_j = inv(H).X0_j and camera matrices Pp_i = P0_i.H are only defined up to a common 4x4 matrix H, which is unknown.
In order to obtain a metric reconstruction, you need to know the calibration matrices of your cameras. Whether you know these matrices (e.g. if you use virtual cameras for image rendering) or you estimated them using camera calibration (see OpenCV calibration tutorials), you can find a method to obtain a metric reconstruction in ยง7.4.5 of "Geometry, constraints and computation of the trifocal tensor", by C.Ressl (PDF).
Note that even when using this method, you cannot obtain an up-to-scale 3D reconstruction, unless you have some additional knowledge (such as knowledge of the actual distance between two fixed 3D points).
Sketch of the algorithm:
Inputs: the three camera matrices P1, P2, P3 (projective world coordinates, with the coordinate system chosen so that P1=[I|0]), the associated calibration matrices K1, K2, K3 and one point correspondence x1, x2, x3.
Outputs: the three camera matrices P1_E, P2_E, P3_E (metric reconstruction).
Set P1_E=K1.[I|0]
Compute the fundamental matrices F21, F31. Denoting P2=[A|a] and P3=[B|b], you have F21=[a]x.A and F31=[b]x.B (see table 9.1 in [HZ00]), where for a 3x1 vector e [e]x = [0,-e_3,e_2;e_3,0,-e_1;-e_2,e_1,0]
Compute the essential matrices E21 = K2'.F21.K1 and E31 = K3'.F31.K1
For i = 2,3, do the following
i. Compute the SVD Ei1=U.S.V'. If det(U)<0 set U=-U. If det(V)<0 set V=-V.
ii. Define W=[0,-1,0;1,0,0;0,0,1], Ri=U.W.V' and ti = third column of U
iii. Define M=[Ri'.ti]x, X1=M.inv(K1).x1 and Xi=M.Ri'.inv(Ki).xi
iv. If X1_3.Xi_3<0, set Ri=U.W'.V' and recompute M and X1
v. If X1_3<0 set ti = -ti
vi. Define Pi_E=Ki.[Ri|ti]
Do the following to retrieve the correct scale for t3 (consistantly to the fact that ||t2||=1):
i. Define p2=R2'.inv(K2).x2 and p3=R3'.inv(K3).x3
ii. Define M=[p2]x
iii. Compute the scale s=(p3'.M.R2'.t2)/(p3'.M.R3'.t3)
iv. Set t3=t3*s
End of the algorithm: the camera matrices P1_E, P2_E, P3_E are valid up to an isotropic scaling of the scene and a change of 3D coordinate system (hence it is a metric reconstruction).
[HZ00] "Multiple view geometry in computer vision" , by R.Hartley and A.Zisserman, 2000.

Get 3D coordinates from 2D image pixel if extrinsic and intrinsic parameters are known

I am doing camera calibration from tsai algo. I got intrensic and extrinsic matrix, but how can I reconstruct the 3D coordinates from that inormation?
1) I can use Gaussian Elimination for find X,Y,Z,W and then points will be X/W , Y/W , Z/W as homogeneous system.
2) I can use the
OpenCV documentation approach:
as I know u, v, R , t , I can compute X,Y,Z.
However both methods end up in different results that are not correct.
What am I'm doing wrong?
If you got extrinsic parameters then you got everything. That means that you can have Homography from the extrinsics (also called CameraPose). Pose is a 3x4 matrix, homography is a 3x3 matrix, H defined as
H = K*[r1, r2, t], //eqn 8.1, Hartley and Zisserman
with K being the camera intrinsic matrix, r1 and r2 being the first two columns of the rotation matrix, R; t is the translation vector.
Then normalize dividing everything by t3.
What happens to column r3, don't we use it? No, because it is redundant as it is the cross-product of the 2 first columns of pose.
Now that you have homography, project the points. Your 2d points are x,y. Add them a z=1, so they are now 3d. Project them as follows:
p = [x y 1];
projection = H * p; //project
projnorm = projection / p(z); //normalize
Hope this helps.
As nicely stated in the comments above, projecting 2D image coordinates into 3D "camera space" inherently requires making up the z coordinates, as this information is totally lost in the image. One solution is to assign a dummy value (z = 1) to each of the 2D image space points before projection as answered by Jav_Rock.
p = [x y 1];
projection = H * p; //project
projnorm = projection / p(z); //normalize
One interesting alternative to this dummy solution is to train a model to predict the depth of each point prior to reprojection into 3D camera-space. I tried this method and had a high degree of success using a Pytorch CNN trained on 3D bounding boxes from the KITTI dataset. Would be happy to provide code but it'd be a bit lengthy for posting here.