How to triangulate Points from a Single Camera multiple Images? - c++

I have a single calibrated camera pointing at a checkerboard at different locations with Known
Camera Intrinsics. fx,fy,cx,cy
Distortion Co-efficients K1,K2,K3,T1,T2,etc..
Camera Rotation & Translation (R,T) from IMU
After Undistortion, I have computed Point correspondence of checkerboard points in all the images with a known camera-to-camera Rotation and Translation vectors.
How can I estimate the 3D points of the checkerboard in all the images?
I think OpenCV has a function to do this but I'm not able to understand how to use this!
1) cv::sfm::triangulatePoints
2) triangulatePoints
How to compute the 3D Points using OpenCV?

Since you already have the matched points form the image you can use findFundamentalMat() to get the fundamental matrix. Keep in mind you need at least 7 matched points to do this. If you have more then 8 points CV_FM_RANSAC might be the best option.
Then use cv::sfm::projectionsFromFundamental() to find the projection matrix for each image, check if the projection matrix is valid (ex.check if the points are in-front of the camera).
then feed the projections and the points it into cv::sfm::triangulatePoints().
Hope this helps :)
Edit
The rotation and translation matrix are needed to change reference frames because the camera moves in SFM. The reference frame is at the position of the camera. Transforms are needed to make sure the position of the points a coherent(under the same reference frame which is usually the reface frame of the camera in the first image), so all the points are in the same coordinate system.
IE. To relate the point gathered by the second frame to the first frame, the third to second frame and so on.
So basically you can use the R and T vector to construct a transform matrix for each frame and multiplying it with your points to put them in the reface frame of the camera in the first frame.

Related

Aruco Pose Estimation from Stereo Setup

I am interested in finding the Rotation Matrix of an Aruco Marker from a Stereo Camera.
I know that estimateposesinglemarkers gives a Rotation Vector (which can be converted to matrix via Rodrigues)and Translation Vector but the values are not that stable and is supposedly written for MonoCam.
I can get Stable 3D points of the Marker from a Stereo Camera, however i am struggling in creating a Rotation Matrix. My Main goal is to achieve what Ali has achieved in this following blog Relative Position of Aruco Markers.
I have tried working with Euler Angles from here by creating a plane of the Aruco Marker from the 3D points that i get from the Stereo Camera but in vain.
I know my algorithm is failing because the values of the Relative Co-ordinates keeps on changing on moving the camera which should not happen as the Relative Co-ordinates b/w the Markers Should remain Constant.
I have a properly Calibrated camera with all the required matrices.
I tried using SolvePnP, but i believe it gives Rvecs and Tvecs which when combined together brings points from the model coordinate system to the camera coordinate system.
Any idea on how i can create the Rotation Matrix of the Marker with my fairly Stable 3D points so that on moving the camera, the relative Co-ordinates doesn't Change ?
Thanks in Advance.

Measure real size of object with Calibrated Camera opencv c++?

i am completing my thesis related opencv.
I want to measure real size of object (mm) with single camera but i have problem with convert the camera's natural units (pixels) and the real world units!!!
After calibrate camera, i have:
Camera matrix (3x3)
Distortion coefficients
Extrinsic parameters [rotation vector(1x3) + translation vector(1x3)]
I have read following link but i can't find out formula to convert unit.
https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
Example about measure size of object
Any sugguestion???
Thanks so much.
As mentioned in the comments, you need the distance to the object to obtain 3D coordinates from pixels. A possible workflow would be:
Rectify the image using the distortion parameters, i.e., correct the distortion caused by the camera.
Deproject the pixels into 3D points in the camera coordinate frame using the camera matrix. For this you can multiply the inverse of the 3x3 camera matrix with a vector containing the pixels [pixel_x, pixel_y, 1]^T. If you multiply the result [x', y', 1]^T with the depth, i.e., the z-component you obtain the 3D point in the camera coordinate frame.
Transform the point from the camera coordinate frame into the world coordinate frame using the extrinsics parameters.
Obtaining the depth values from an image alone is not possible. The only option is to use some additional information. Maybe your object is placed on a table and you know the distance between the camera and the table.
To measure distances between the camera and a table or even the object itself you could use Aruco markers, which are also available within openCV.

Film coordinate to world coordinate

I am working on building 3D point cloud from features matching using OpenCV3.1 and OpenGL.
I have implemented 1) Camera Calibration (Hence I am having Intrinsic Matrix of the camera) 2) Feature extraction( Hence I have 2D points in Pixel Coordinates).
I was going through few websites but generally all have suggested the flow for converting 3D object points to pixel points but I am doing completely backword projection. Here is the ppt that explains it well.
I have implemented film coordinates(u,v) from pixel coordinates(x,y)(With the help of intrisic matrix). Can anyone shed the light on how I can render "Z" of camera coordinate(X,Y,Z) from the film coordinate(x,y).
Please guide me on how I can utilize functions for the desired goal in OpenCV like solvePnP, recoverPose, findFundamentalMat, findEssentialMat.
With single camera and rotating object on fixed rotation platform I would implement something like this:
Each camera has resolution xs,ys and field of view FOV defined by two angles FOVx,FOVy so either check your camera data sheet or measure it. From that and perpendicular distance (z) you can convert any pixel position (x,y) to 3D coordinate relative to camera (x',y',z'). So first convert pixel position to angles:
ax = (x - (xs/2)) * FOVx / xs
ay = (y - (ys/2)) * FOVy / ys
and then compute cartesian position in 3D:
x' = distance * tan(ax)
y' = distance * tan(ay)
z' = distance
That is nice but on common image we do not know the distance. Luckily on such setup if we turn our object than any convex edge will make an maximum ax angle on the sides if crossing the perpendicular plane to camera. So check few frames and if maximal ax detected you can assume its an edge (or convex bump) of object positioned at distance.
If you also know the rotation angle ang of your platform (relative to your camera) Then you can compute the un-rotated position by using rotation formula around y axis (Ay matrix in the link) and known platform center position relative to camera (just subbstraction befor the un-rotation)... As I mention all this is just simple geometry.
In an nutshell:
obtain calibration data
FOVx,FOVy,xs,ys,distance. Some camera datasheets have only FOVx but if the pixels are rectangular you can compute the FOVy from resolution as
FOVx/FOVy = xs/ys
Beware with Multi resolution camera modes the FOV can be different for each resolution !!!
extract the silhouette of your object in the video for each frame
you can subbstract the background image to ease up the detection
obtain platform angle for each frame
so either use IRC data or place known markers on the rotation disc and detect/interpolate...
detect ax maximum
just inspect the x coordinate of the silhouette (for each y line of image separately) and if peak detected add its 3D position to your model. Let assume rotating rectangular box. Some of its frames could look like this:
So inspect one horizontal line on all frames and found the maximal ax. To improve accuracy you can do a close loop regulation loop by turning the platform until peak is found "exactly". Do this for all horizontal lines separately.
btw. if you detect no ax change over few frames that means circular shape with the same radius ... so you can handle each of such frame as ax maximum.
Easy as pie resulting in 3D point cloud. Which you can sort by platform angle to ease up conversion to mesh ... That angle can be also used as texture coordinate ...
But do not forget that you will lose some concave details that are hidden in the silhouette !!!
If this approach is not enough you can use this same setup for stereoscopic 3D reconstruction. Because each rotation behaves as new (known) camera position.
You can't, if all you have is 2D images from that single camera location.
In theory you could use heuristics to infer a Z stacking. But mathematically your problem is under defined and there's literally infinitely many different Z coordinates that would evaluate your constraints. You have to supply some extra information. For example you could move your camera around over several frames (Google "structure from motion") or you could use multiple cameras or use a camera that has a depth sensor and gives you complete XYZ tuples (Kinect or similar).
Update due to comment:
For every pixel in a 2D image there is an infinite number of points that is projected to it. The technical term for that is called a ray. If you have two 2D images of about the same volume of space each image's set of ray (one for each pixel) intersects with the set of rays corresponding to the other image. Which is to say, that if you determine the ray for a pixel in image #1 this maps to a line of pixels covered by that ray in image #2. Selecting a particular pixel along that line in image #2 will give you the XYZ tuple for that point.
Since you're rotating the object by a certain angle θ along a certain axis a between images, you actually have a lot of images to work with. All you have to do is deriving the camera location by an additional transformation (inverse(translate(-a)·rotate(θ)·translate(a)).
Then do the following: Select a image to start with. For the particular pixel you're interested in determine the ray it corresponds to. For that simply assume two Z values for the pixel. 0 and 1 work just fine. Transform them back into the space of your object, then project them into the view space of the next camera you chose to use; the result will be two points in the image plane (possibly outside the limits of the actual image, but that's not a problem). These two points define a line within that second image. Find the pixel along that line that matches the pixel on the first image you selected and project that back into the space as done with the first image. Due to numerical round-off errors you're not going to get a perfect intersection of the rays in 3D space, so find the point where the ray are the closest with each other (this involves solving a quadratic polynomial, which is trivial).
To select which pixel you want to match between images you can use some feature motion tracking algorithm, as used in video compression or similar. The basic idea is, that for every pixel a correlation of its surroundings is performed with the same region in the previous image. Where the correlation peaks is, where it likely was moved from into.
With this pixel tracking in place you can then derive the structure of the object. This is essentially what structure from motion does.

OpenCV Structure from Motion Reprojection Issue

I am currently facing an issue with my Structure from Motion program based on OpenCv.
I'm gonna try to depict you what it does, and what it is supposed to do.
This program lies on the classic "structure from motion" method.
The basic idea is to take a pair of images, detect their keypoints and compute the descriptors of those keypoints. Then, the keypoints matching is done, with a certain number of tests to insure the result is good. That part works perfectly.
Once this is done, the following computations are performed : fundamental matrix, essential matrix, SVD decomposition of the essential matrix, camera matrix computation and finally, triangulation.
The result for a pair of images is a set of 3D coordinates, giving us points to be drawn in a 3D viewer. This works perfectly, for a pair.
Indeed, here is my problem : for a pair of images, the 3D points coordinates are calculated in the coordinate system of the first image of the image pair, taken as the reference image. When working with more than two images, which is the objective of my program, I have to reproject the 3D points computed in the coordinate system of the very first image, in order to get a consistent result.
My question is : How do I reproject 3D points coordinate given in a camera related system, into an other camera related system ? With the camera matrices ?
My idea was to take the 3D point coordinates, and to multiply them by the inverse of each camera matrix before.
I clarify :
Suppose I am working on the third and fourth image (hence, the third pair of images, because I am working like 1-2 / 2-3 / 3-4 and so on).
I get my 3D point coordinates in the coordinate system of the third image, how do I do to reproject them properly in the very first image coordinate system ?
I would have done the following :
Get the 3D points coordinates matrix, apply the inverse of the camera matrix for image 2 to 3, and then apply the inverse of the camera matrix for image 1 to 2.
Is that even correct ?
Because those camera matrices are non square matrices, and I can't inverse them.
I am surely mistaking somewhere, and I would be grateful if someone could enlighten me, I am pretty sure this is a relative easy one, but I am obviously missing something...
Thanks a lot for reading :)
Let us say you have a 3 * 4 extrinsic parameter matrix called P. To match the notations of OpenCV documentation, this is [R|t].
This matrix P describes the projection from world space coordinates to the camera space coordinates. To quote the documentation:
[R|t] translates coordinates of a point (X, Y, Z) to a coordinate system, fixed with respect to the camera.
You are wondering why this matrix is non-square. That is because in the usual context of OpenCV, you are not expecting homogeneous coordinates as output. Therefore, to make it square, just add a fourth row containing (0,0,0,1). Let's call this new square matrix Q.
You have one such matrix for each pair of cameras, that is you have one Qk matrix for each pair of images {k,k+1} that describes the projection from the coordinate space of camera k to that of camera k+1. Those matrices are inversible because they describe isometries in homogeneous coordinates.
To go from the coordinate space of camera 3 to that of camera 1, just apply to your points the inverse of Q2 and then the inverse of Q1.

Head Pose Estimation with 2 cameras

I am working on head pose estimation project using 2 cameras. For one camera system works and returns rotation matrix and translation vector of a head with respect to each camera coordinate system. I have rendered object in OpenGL scene which is rotated and translated to represent head movements. To display computed rotation matrix and translation vector I simply use the following OpenGL commands.
glMatrixMode(GL_MODELVIEW);
glLoadMatrixd(pose_matrix);
where pose matrix is OpenGL ModelView matrix constructed from rotation matrix and translation vector of a head.
Now I am trying to do this for 2 calibrated cameras. When the first camera lost the track of a face but the second one estimates head pose I display the rotation and translation with respect to second and visa versa. I want to display one OpenGl object and move it for both cases. For that I need to transfer pose matrices into the common coordinate frame.
I know the relative geometry of 2 cameras with respect to each other. I assume one of the cameras is the world coordinate frame and I transfer the head pose matrix of the second camera to the frame of first camera by multiplying pose matrix and calibration matrix of second camera with respect to first camera. When I load this multiplied matrix into the OpenGL ModelView matrix I get wrong results. When first camera captures face the object is moving right but for the second camera object is translated and rotated and is not in the same place as for the case of first camera.
What could be the problem? Maybe OpenGL displaying part is wrong or?
Safe default assumption: OpenGL is right, your code is wrong.
Without seeing your code, I can suggest to printf your matrices and double-check the math with them in Matlab or Octave.
A common mistake is to forget that by default OpenGL PRE-multiplies ROW-vectors by the modelvie matrix (indeed all matrices). That is, it multiplies as v_row * M, with the matrix stored in col-major order, whereas you may be thinking within the common mathematical convention of treating vectors as COLUMN ones, and POST-multiplying them as M * v_col (with M stored row-major).
If you prefer working with the latter convention(recommended), look up up the GL_ARB_transpose_matrix extension.