Locating objects in FOV of an image using GPS only - computer-vision

Is it possible to locate objects in an image in the global reference frame (e.g. ECEF) using only a GPS (offset from the camera) ?
The problem I see is that in order to take into account the offset distance between GPS antenna phase center, an IMU integrated with the GPS is necessary to compute the platform (Camera + GPS) rotation with respect to the NED/ENU frame. This will allow the use of collinearity equation to compute the location of the object in FOV in ECEF.
The question I have, because the platform does not have an IMU, only the GPS, I wanted to see if it's possible to compute object location in ECEF only from the GPS of the platform. It is possible if one assumes the camera projection center is collocated with an antenna, but that is not the case, the offset distance does affect the estimation of the object location in global reference frame
Any ideas ?!

Related

Calculating the rotation and translation (external parameters) of multiple cameras relative to a single camera?

Given a set of 5 cameras positioned as shown in the image below which capture the top, front, rear, left and right views of an object placed in the center.
Also given that the origin of the world coordinate is assumed to be the top view (therefore used as the reference view), how do I go about calculating the rotation and translation (external parameters of the cameras) of all other 4 cameras relative to this top camera. The front, rear, left and right cameras have also been slanted 45 degrees (about the x axis) to capture the object in the middle.
The calculation of the external parameters will later be used to calculate the projection matrix for each camera (the internal parameters are known)
Calibrate the extrinsic parameters with respect to an object of known shape and size which is visible to all cameras, or at least to all pairs of (reference camera, current camera).
For best results use a 3D object, not a plane. For example, a box with three unequal sides, or a dodecahedron. The latter would allow you to calibrate all cameras simultaneously, since each of them should see three faces at least. Depending on your accuracy requirements, you may need to spend some real money on getting this object machined accurately.
As for software, you can of course whip up a script to do it using OpenCV, or just use a CG tool like Blender, where visualization of the results may be much easier.

Best-fit relative orientations and locations of stereo cameras

I have 2 static cameras being used for stereo 3D positioning of objects. I need to determine the location and orientation of the second camera relative to the first as accurately as possible. I am trying to do this by locating n objects on the both cameras' images and correlating between the two cameras in order to calibrate my system to locate additional objects later.
Is there a preferred way to use a large number (6+) of correlated points to determine the best-fit relative locations/orientations of 2 cameras, assuming that I have already compensated for any distortive effects and know the correct (but somewhat noisy) angles between the optical axes and the objects, and the distance between the cameras?
My solution is to determine a rotation to perform on the second camera (B) in order to realign its measurements so they are from the point of view of the first camera (A) as if it has been translated to the location of camera B.
I did this using a compound rotation by first rotating the second camera's measurements about the cross product of vector -AB (B pointing at A from the perspective of A) and BA (B pointing at A from the perspective from B) such that R1*BA=-AB. Doing this rotation just means the vectors pointing between the cameras are aligned, and another rotation must be done in order to account for further degrees of freedom.
That rotation was done so that the second one can be about -AB. R2 is a rotation of theta radians about -AB. I found theta by taking the cross products of my measurements from camera A and vector AB, and comparing them to the cross products of R1*(the measurements from camera B) and -AB. I numerically minimized the RMS of the angles between the cross product pairs, because when the cameras are aligned those cross product vectors should be all pointing in the same directions because they are normal to coplanar planes.
After that I can use https://math.stackexchange.com/questions/61719/finding-the-intersection-point-of-many-lines-in-3d-point-closest-to-all-lines to find accurate 3D locations of intersection points by applying R1*R2 to any future measurements from camera B.

Determining homography from known planes?

I've got a question related to multiple view geometry.
I'm currently dealing with a problem where I have a number of images collected by a drone flying around an object of interest. This object is planar, and I am hoping to eventually stitch the images together.
Letting aside the classical way of identifying corresponding feature pairs, computing a homography and warping/blending, I want to see what information related to this task I can infer from prior known data.
Specifically, for each acquired image I know the following two things: I know the correspondence between the central point of my image and a point on the object of interest (on whose plane I would eventually want to warp my image). I also have a normal vector to the plane of each image.
So, knowing the centre point (in object-centric world coordinates) and the normal, I can derive the plane equation of each image.
My question is, knowing the plane equation of 2 images is it possible to compute a homography (or part of the transformation matrix, such as the rotation) between the 2?
I get the feeling that this may seem like a very straightforward/obvious answer to someone with deep knowledge of visual geometry but since it's not my strongest point I'd like to double check...
Thanks in advance!
Your "normal" is the direction of the focal axis of the camera.
So, IIUC, you have a 3D point that projects on the image center in both images, which is another way of saying that (absent other information) the motion of the camera consists of the focal axis orbiting about a point on the ground plane, plus an arbitrary rotation about the focal axis, plus an arbitrary translation along the focal axis.
The motion has a non-zero baseline, therefore the transformation between images is generally not a homography. However, the portion of the image occupied by the ground plane does, of course, transform as a homography.
Such a motion is defined by 5 parameters, e.g. the 3 components of the rotation vector for the orbit, plus the the angle of rotation about the focal axis, plus the displacement along the focal axis. However the one point correspondence you have gives you only two equations.
It follows that you don't have enough information to constrain the homography between the images of the ground plane.

how do I re-project points in a camera - projector system (after calibration)

i have seen many blog entries and videos and source coude on the internet about how to carry out camera + projector calibration using openCV, in order to produce the camera.yml, projector.yml and projectorExtrinsics.yml files.
I have yet to see anyone discussing what to do with this files afterwards. Indeed I have done a calibration myself, but I don't know what is the next step in my own application.
Say I write an application that now uses the camera - projector system that I calibrated to track objects and project something on them. I will use contourFind() to grab some points of interest from the moving objects and now I want to project these points (from the projector!) onto the objects!
what I want to do is (for example) track the centre of mass (COM) of an object and show a point on the camera view of the tracked object (at its COM). Then a point should be projected on the COM of the object in real time.
It seems that projectPoints() is the openCV function I should use after loading the yml files, but I am not sure how I will account for all the intrinsic & extrinsic calibration values of both camera and projector. Namely, projectPoints() requires as parameters the
vector of points to re-project (duh!)
rotation + translation matrices. I think I can use the projectorExtrinsics here. or I can use the composeRT() function to generate a final rotation & a final translation matrix from the projectorExtrinsics (which I have in the yml file) and the cameraExtrinsics (which I don't have. side question: should I not save them too in a file??).
intrinsics matrix. this tricky now. should I use the camera or the projector intrinsics matrix here?
distortion coefficients. again should I use the projector or the camera coefs here?
other params...
So If I use either projector or camera (which one??) intrinsics + coeffs in projectPoints(), then I will only be 'correcting' for one of the 2 instruments . Where / how will I use the other's instruments intrinsics ??
What else do I need to use apart from load() the yml files and projectPoints() ? (perhaps undistortion?)
ANY help on the matter is greatly appreciated .
If there is a tutorial or a book (no, O'Reilly "Learning openCV" does not talk about how to use the calibration yml files either! - only about how to do the actual calibration), please point me in that direction. I don't necessarily need an exact answer!
First, you seem to be confused about the general role of a camera/projector model: its role is to map 3D world points to 2D image points. This sounds obvious, but this means that given extrinsics R,t (for orientation and position), distortion function D(.) and intrisics K, you can infer for this particular camera the 2D projection m of a 3D point M as follows: m = K.D(R.M+t). The projectPoints function does exactly that (i.e. 3D to 2D projection), for each input 3D point, hence you need to give it the input parameters associated to the camera in which you want your 3D points projected (projector K&D if you want projector 2D coordinates, camera K&D if you want camera 2D coordinates).
Second, when you jointly calibrate your camera and projector, you do not estimate a set of extrinsics R,t for the camera and another for the projector, but only one R and one t, which represent the rotation and translation between the camera's and projector's coordinate systems. For instance, this means that your camera is assumed to have rotation = identity and translation = zero, and the projector has rotation = R and translation = t (or the other way around, depending on how you did the calibration).
Now, concerning the application you mentioned, the real problem is: how do you estimate the 3D coordinates of a given point ?
Using two cameras and one projector, this would be easy: you could track the objects of interest in the two camera images, triangulate their 3D positions using the two 2D projections using function triangulatePoints and finally project this 3D point in the projector 2D coordinates using projectPoints in order to know where to display things with your projector.
With only one camera and one projector, this is still possible but more difficult because you cannot triangulate the tracked points from only one observation. The basic idea is to approach the problem like a sparse stereo disparity estimation problem. A possible method is as follows:
project a non-ambiguous image (e.g. black and white noise) using the projector, in order to texture the scene observed by the camera.
as before, track the objects of interest in the camera image
for each object of interest, correlate a small window around its location in the camera image with the projector image, in order to find where it projects in the projector 2D coordinates
Another approach, which unlike the one above would use the calibration parameters, could be to do a dense 3D reconstruction using stereoRectify and StereoBM::operator() (or gpu::StereoBM_GPU::operator() for the GPU implementation), map the tracked 2D positions to 3D using the estimated scene depth, and finally project into the projector using projectPoints.
Anyhow, this is easier, and more accurate, using two cameras.
Hope this helps.

How to verify that the camera calibration is correct? (or how to estimate the error of reprojection)

The quality of calibration is measured by the reprojection error (is there an alternative?), which requires a knowledge world coordinates of some 3d point(s).
Is there a simple way to produce such known points? Is there a way to verify the calibration in some other way (for example, Zhang's calibration method only requires that the calibration object be planar and the geometry of the system need not to be known)
You can verify the accuracy of the estimated nonlinear lens distortion parameters independently of pose. Capture images of straight edges (e.g. a plumb line, or a laser stripe on a flat surface) spanning the field of view (an easy way to span the FOV is to rotate the camera keeping the plumb line fixed, then add all the images). Pick points on said line images, undistort their coordinates, fit mathematical lines, compute error.
For the linear part, you can also capture images of multiple planar rigs at a known relative pose, either moving one planar target with a repeatable/accurate rig (e.g. a turntable), or mounting multiple planar targets at known angles from each other (e.g. three planes at 90 deg from each other).
As always, a compromise is in order between accuracy requirements and budget. With enough money and a friendly machine shop nearby you can let your fantasy run wild with rig geometry. I had once a dodecahedron about the size of a grapefruit, machined out of white plastic to 1/20 mm spec. Used it to calibrate the pose of a camera on the end effector of a robotic arm, moving it on a sphere around a fixed point. The dodecahedron has really nice properties in regard to occlusion angles. Needless to say, it's all patented.
The images used in generating the intrinsic calibration can also be used to verify it. A good example of this is the camera-calib tool from the Mobile Robot Programming Toolkit (MRPT).
Per Zhang's method, the MRPT calibration proceeds as follows:
Process the input images:
1a. Locate the calibration target (extract the chessboard corners)
1b. Estimate the camera's pose relative to the target, assuming that the target is a planar chessboard with a known number of intersections.
1c. Assign points on the image to a model of the calibration target in relative 3D coordinates.
Find an intrinsic calibration that best explains all of the models generated in 1b/c.
Once the intrinsic calibration is generated, we can go back to the source images.
For each image, multiply the estimated camera pose with the intrinsic calibration, then apply that to each of the points derived in 1c.
This will map the relative 3D points from the target model back to the 2D calibration source image. The difference between the original image feature (chessboard corner) and the reprojected point is the calibration error.
MRPT performs this test on all input images and will give you an aggregate reprojection error.
If you want to verify a full system, including both the camera intrinsics and the camera-to-world transform, you will probably need to build a jig that places the camera and target in a known configuration, then test calculated 3D points against real-world measurements.
On Engine's question: the pose matrix is a [R|t] matrix where R is a pure 3D rotation and t a translation vector. If you have computed a homography from the image, section 3.1 of Zhang's Microsoft Technical Report (http://research.microsoft.com/en-us/um/people/zhang/Papers/TR98-71.pdf) gives a closed form method to obtain both R and t using the known homography and the intrinsic camera matrix K. ( I can't comment, so I added as a new answer)
Should be just variance and bias in calibration (pixel re-projection) errors given enough variability in calibration rig poses. It is better to visualize these errors rather than to look at the values. For example, error vectors pointing to the center would be indicative of wrong focal length. Observing curved lines can give intuition about distortion coefficients.
To calibrate the camera one has to jointly solve for extrinsic and intrinsic. The latter can be known from manufacturer, the solving for extrinsic (rotation and translation) involves decomposition of calculated homography: Decompose Homography matrix in opencv python
Calculate a Homography with only Translation, Rotation and Scale in Opencv
The homography is used here since most calibration targets are flat.