Best-fit relative orientations and locations of stereo cameras - computer-vision

I have 2 static cameras being used for stereo 3D positioning of objects. I need to determine the location and orientation of the second camera relative to the first as accurately as possible. I am trying to do this by locating n objects on the both cameras' images and correlating between the two cameras in order to calibrate my system to locate additional objects later.
Is there a preferred way to use a large number (6+) of correlated points to determine the best-fit relative locations/orientations of 2 cameras, assuming that I have already compensated for any distortive effects and know the correct (but somewhat noisy) angles between the optical axes and the objects, and the distance between the cameras?

My solution is to determine a rotation to perform on the second camera (B) in order to realign its measurements so they are from the point of view of the first camera (A) as if it has been translated to the location of camera B.
I did this using a compound rotation by first rotating the second camera's measurements about the cross product of vector -AB (B pointing at A from the perspective of A) and BA (B pointing at A from the perspective from B) such that R1*BA=-AB. Doing this rotation just means the vectors pointing between the cameras are aligned, and another rotation must be done in order to account for further degrees of freedom.
That rotation was done so that the second one can be about -AB. R2 is a rotation of theta radians about -AB. I found theta by taking the cross products of my measurements from camera A and vector AB, and comparing them to the cross products of R1*(the measurements from camera B) and -AB. I numerically minimized the RMS of the angles between the cross product pairs, because when the cameras are aligned those cross product vectors should be all pointing in the same directions because they are normal to coplanar planes.
After that I can use https://math.stackexchange.com/questions/61719/finding-the-intersection-point-of-many-lines-in-3d-point-closest-to-all-lines to find accurate 3D locations of intersection points by applying R1*R2 to any future measurements from camera B.

Related

Calculating the rotation and translation (external parameters) of multiple cameras relative to a single camera?

Given a set of 5 cameras positioned as shown in the image below which capture the top, front, rear, left and right views of an object placed in the center.
Also given that the origin of the world coordinate is assumed to be the top view (therefore used as the reference view), how do I go about calculating the rotation and translation (external parameters of the cameras) of all other 4 cameras relative to this top camera. The front, rear, left and right cameras have also been slanted 45 degrees (about the x axis) to capture the object in the middle.
The calculation of the external parameters will later be used to calculate the projection matrix for each camera (the internal parameters are known)
Calibrate the extrinsic parameters with respect to an object of known shape and size which is visible to all cameras, or at least to all pairs of (reference camera, current camera).
For best results use a 3D object, not a plane. For example, a box with three unequal sides, or a dodecahedron. The latter would allow you to calibrate all cameras simultaneously, since each of them should see three faces at least. Depending on your accuracy requirements, you may need to spend some real money on getting this object machined accurately.
As for software, you can of course whip up a script to do it using OpenCV, or just use a CG tool like Blender, where visualization of the results may be much easier.

Determining homography from known planes?

I've got a question related to multiple view geometry.
I'm currently dealing with a problem where I have a number of images collected by a drone flying around an object of interest. This object is planar, and I am hoping to eventually stitch the images together.
Letting aside the classical way of identifying corresponding feature pairs, computing a homography and warping/blending, I want to see what information related to this task I can infer from prior known data.
Specifically, for each acquired image I know the following two things: I know the correspondence between the central point of my image and a point on the object of interest (on whose plane I would eventually want to warp my image). I also have a normal vector to the plane of each image.
So, knowing the centre point (in object-centric world coordinates) and the normal, I can derive the plane equation of each image.
My question is, knowing the plane equation of 2 images is it possible to compute a homography (or part of the transformation matrix, such as the rotation) between the 2?
I get the feeling that this may seem like a very straightforward/obvious answer to someone with deep knowledge of visual geometry but since it's not my strongest point I'd like to double check...
Thanks in advance!
Your "normal" is the direction of the focal axis of the camera.
So, IIUC, you have a 3D point that projects on the image center in both images, which is another way of saying that (absent other information) the motion of the camera consists of the focal axis orbiting about a point on the ground plane, plus an arbitrary rotation about the focal axis, plus an arbitrary translation along the focal axis.
The motion has a non-zero baseline, therefore the transformation between images is generally not a homography. However, the portion of the image occupied by the ground plane does, of course, transform as a homography.
Such a motion is defined by 5 parameters, e.g. the 3 components of the rotation vector for the orbit, plus the the angle of rotation about the focal axis, plus the displacement along the focal axis. However the one point correspondence you have gives you only two equations.
It follows that you don't have enough information to constrain the homography between the images of the ground plane.

Calculate baseline distance between 2 cameras (images)

I want to estimate the depth map between left and right images from "http://perso.lcpc.fr/tarel.jean-philippe/syntim/paires/GrRub.html". I understand that I must first calculate depth from disparity using formula Z = B * F/d
The data set unfortunately does not provide Baseline distance B.
Could you suggest how I can calculate this(if possible) or how I could calculate depth map from given data alone?
Thank you for your help.
As I am new to stackoverflow and computer vision, do let me know if I should provide more details.
If you have the extrinsic parameters, rotational matrices R and translation vector t, there are two cases
a) (most probable) one of your camera (the main camera) is the centre of your coordinate system: the R1 matrix is the identity matrix and the related t1 is equal to [0,0,0]. In this case you could think the baseline B as the euclidean norm of the translation vector t2 of the other camera
b) in case none of your camera is the centre of your coordinate system, at least you should have calibrated your cameras with respect to the same reference system. The baseline B is the euclidean norm of the difference vector (t1 - t2)
(I was not able to open the left/right camera links, so I could not verify)

How to verify that the camera calibration is correct? (or how to estimate the error of reprojection)

The quality of calibration is measured by the reprojection error (is there an alternative?), which requires a knowledge world coordinates of some 3d point(s).
Is there a simple way to produce such known points? Is there a way to verify the calibration in some other way (for example, Zhang's calibration method only requires that the calibration object be planar and the geometry of the system need not to be known)
You can verify the accuracy of the estimated nonlinear lens distortion parameters independently of pose. Capture images of straight edges (e.g. a plumb line, or a laser stripe on a flat surface) spanning the field of view (an easy way to span the FOV is to rotate the camera keeping the plumb line fixed, then add all the images). Pick points on said line images, undistort their coordinates, fit mathematical lines, compute error.
For the linear part, you can also capture images of multiple planar rigs at a known relative pose, either moving one planar target with a repeatable/accurate rig (e.g. a turntable), or mounting multiple planar targets at known angles from each other (e.g. three planes at 90 deg from each other).
As always, a compromise is in order between accuracy requirements and budget. With enough money and a friendly machine shop nearby you can let your fantasy run wild with rig geometry. I had once a dodecahedron about the size of a grapefruit, machined out of white plastic to 1/20 mm spec. Used it to calibrate the pose of a camera on the end effector of a robotic arm, moving it on a sphere around a fixed point. The dodecahedron has really nice properties in regard to occlusion angles. Needless to say, it's all patented.
The images used in generating the intrinsic calibration can also be used to verify it. A good example of this is the camera-calib tool from the Mobile Robot Programming Toolkit (MRPT).
Per Zhang's method, the MRPT calibration proceeds as follows:
Process the input images:
1a. Locate the calibration target (extract the chessboard corners)
1b. Estimate the camera's pose relative to the target, assuming that the target is a planar chessboard with a known number of intersections.
1c. Assign points on the image to a model of the calibration target in relative 3D coordinates.
Find an intrinsic calibration that best explains all of the models generated in 1b/c.
Once the intrinsic calibration is generated, we can go back to the source images.
For each image, multiply the estimated camera pose with the intrinsic calibration, then apply that to each of the points derived in 1c.
This will map the relative 3D points from the target model back to the 2D calibration source image. The difference between the original image feature (chessboard corner) and the reprojected point is the calibration error.
MRPT performs this test on all input images and will give you an aggregate reprojection error.
If you want to verify a full system, including both the camera intrinsics and the camera-to-world transform, you will probably need to build a jig that places the camera and target in a known configuration, then test calculated 3D points against real-world measurements.
On Engine's question: the pose matrix is a [R|t] matrix where R is a pure 3D rotation and t a translation vector. If you have computed a homography from the image, section 3.1 of Zhang's Microsoft Technical Report (http://research.microsoft.com/en-us/um/people/zhang/Papers/TR98-71.pdf) gives a closed form method to obtain both R and t using the known homography and the intrinsic camera matrix K. ( I can't comment, so I added as a new answer)
Should be just variance and bias in calibration (pixel re-projection) errors given enough variability in calibration rig poses. It is better to visualize these errors rather than to look at the values. For example, error vectors pointing to the center would be indicative of wrong focal length. Observing curved lines can give intuition about distortion coefficients.
To calibrate the camera one has to jointly solve for extrinsic and intrinsic. The latter can be known from manufacturer, the solving for extrinsic (rotation and translation) involves decomposition of calculated homography: Decompose Homography matrix in opencv python
Calculate a Homography with only Translation, Rotation and Scale in Opencv
The homography is used here since most calibration targets are flat.

How can I determine distance from an object in a video?

I have a video file recorded from the front of a moving vehicle. I am going to use OpenCV for object detection and recognition but I'm stuck on one aspect. How can I determine the distance from a recognized object.
I can know my current speed and real-world GPS position but that is all. I can't make any assumptions about the object I'm tracking. I am planning to use this to track and follow objects without colliding with them. Ideally I would like to use this data to derive the object's real-world position, which I could do if I could determine the distance from the camera to the object.
Your problem's quite standard in the field.
Firstly,
you need to calibrate your camera. This can be done offline (makes life much simpler) or online through self-calibration.
Calibrate it offline - please.
Secondly,
Once you have the calibration matrix of the camera K, determine the projection matrix of the camera in a successive scene (you need to use parallax as mentioned by others). This is described well in this OpenCV tutorial.
You'll have to use the GPS information to find the relative orientation between the cameras in the successive scenes (that might be problematic due to noise inherent in most GPS units), i.e. the R and t mentioned in the tutorial or the rotation and translation between the two cameras.
Once you've resolved all that, you'll have two projection matrices --- representations of the cameras at those successive scenes. Using one of these so-called camera matrices, you can "project" a 3D point M on the scene to the 2D image of the camera on to pixel coordinate m (as in the tutorial).
We will use this to triangulate the real 3D point from 2D points found in your video.
Thirdly,
use an interest point detector to track the same point in your video which lies on the object of interest. There are several detectors available, I recommend SURF since you have OpenCV which also has several other detectors like Shi-Tomasi corners, Harris, etc.
Fourthly,
Once you've tracked points of your object across the sequence and obtained the corresponding 2D pixel coordinates you must triangulate for the best fitting 3D point given your projection matrix and 2D points.
The above image nicely captures the uncertainty and how a best fitting 3D point is computed. Of course in your case, the cameras are probably in front of each other!
Finally,
Once you've obtained the 3D points on the object, you can easily compute the Euclidean distance between the camera center (which is the origin in most cases) and the point.
Note
This is obviously not easy stuff but it's not that hard either. I recommend Hartley and Zisserman's excellent book Multiple View Geometry which has described everything above in explicit detail with MATLAB code to boot.
Have fun and keep asking questions!
When you have moving video, you can use temporal parallax to determine the relative distance of objects. Parallax: (definition).
The effect would be the same we get with our eyes which which can gain depth perception by looking at the same object from slightly different angles. Since you are moving, you can use two successive video frames to get your slightly different angle.
Using parallax calculations, you can determine the relative size and distance of objects (relative to one another). But, if you want the absolute size and distance, you will need a known point of reference.
You will also need to know the speed and direction being traveled (as well as the video frame rate) in order to do the calculations. You might be able to derive the speed of the vehicle using the visual data but that adds another dimension of complexity.
The technology already exists. Satellites determine topographic prominence (height) by comparing multiple images taken over a short period of time. We use parallax to determine the distance of stars by taking photos of night sky at different points in earth's orbit around the sun. I was able to create 3-D images out of an airplane window by taking two photographs within short succession.
The exact technology and calculations (even if I knew them off the top of my head) are way outside the scope of discussing here. If I can find a decent reference, I will post it here.
You need to identify the same points in the same object on two different frames taken a known distance apart. Since you know the location of the camera in each frame, you have a baseline ( the vector between the two camera positions. Construct a triangle from the known baseline and the angles to the identified points. Trigonometry gives you the length of the unknown sides of the traingles for the known length of the baseline and the known angles between the baseline and the unknown sides.
You can use two cameras, or one camera taking successive shots. So, if your vehicle is moving a 1 m/s and you take fames every second, then successibe frames will gibe you a 1m baseline which should be good to measure the distance of objects up to, say, 5m away. If you need to range objects further away than the frames used need to be further apart - however more distant objects will in view for longer.
Observer at F1 sees target at T with angle a1 to velocity vector. Observer moves distance b to F2. Sees target at T with angle a2.
Required to find r1, range from target at F1
The trigonometric identity for cosine gives
Cos( 90 – a1 ) = x / r1 = c1
Cos( 90 - a2 ) = x / r2 = c2
Cos( a1 ) = (b + z) / r1 = c3
Cos( a2 ) = z / r2 = c4
x is distance to target orthogonal to observer’s velocity vector
z is distance from F2 to intersection with x
Solving for r1
r1 = b / ( c3 – c1 . c4 / c2 )
Two cameras so you can detect parallax. It's what humans do.
edit
Please see ravenspoint's answer for more detail. Also, keep in mind that a single camera with a splitter would probably suffice.
use stereo disparity maps. lots of implementations are afloat, here are some links:
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT11/node4.html
http://www.ece.ucsb.edu/~manj/ece181bS04/L14(morestereo).pdf
In you case you don't have stereo camera, but depth can be evaluated using video
http://www.springerlink.com/content/g0n11713444148l2/
I think the above will be what might help you the most.
research has progressed so far that depth can be evaluated ( though not to a satisfactory extend) from a single monocular image
http://www.cs.cornell.edu/~asaxena/learningdepth/
Someone please correct me if I'm wrong, but it seems to me that if you're going to simply use a single camera and simply relying on a software solution, any processing you might do would be prone to false positives. I highly doubt that there is any processing that could tell the difference between objects that really are at the perceived distance and those which only appear to be at that distance (like the "forced perspective") in movies.
Any chance you could add an ultrasonic sensor?
first, you should calibrate your camera so you can get the relation between the objects positions in the camera plan and their positions in the real world plan, if you are using a single camera you can use the "optical flow technic"
if you are using two cameras you can use the triangulation method to find the real position (it will be easy to find the distance of the objects) but the probem with the second method is the matching, which means how can you find the position of an object 'x' in camera 2 if you already know its position in camera 1, and here you can use the 'SIFT' algorithme.
i just gave you some keywords wish it could help you.
Put and object of known size in the cameras field of view. That way you can have a more objective metric to measure angular distances. Without a second viewpoint/camera you'll be limited to estimating size/distance but at least it won't be a complete guess.