Homography for Image Rectification - computer-vision

We can do image rectification given stereo images. For a single image we can find the two vanishing points and then the vanishing line. Using this vanishing line we can do projective rectification. But what constraints are required for affine rectification??
I basically want to rectify an image to its frontal view such that parallel line in the world are parallel and also parallel to the x-axis. I hope am making myself clear ..
Also what is the homography to transfer a vanishing point (x,y) in the image to (1,0,0) ???
Thanks in advance

A projective transformation transfers an imaged point to an ideal point (1,0,0). It thereby rectifies an image up to an affine transformation of what people would normally want to see. This process is "affine rectification". Affine transformations keep lines parallel, but can skew the image and make it look weird, so I'm not sure if that's where you want to stop. Affine transformations can turn circles into ellipses and can change the angle of intersecting lines.
To remove the affine transformation (metric rectification), one must identify the "conic dual to the circular points". This can be done by identifying two pairs of perpendicular lines; identifying an ellipse that should really be a circle; or by using two known length ratios. Once this is done, the only differences between your image and how you would like it to look will be its scale, its x and y positions and the rotation of the whole image.

Related

Compute homography given rotation and translation between two cameras

I know that one can compute the homography matrix by using at least four correpondence points.
I was wondering if and how can I obtain a homography matrix if I already know the rotation and translation between two cameras, including the camera intrinsics?
I found something that looked like this
H= KRK^-1
but this assumes a pure rotation. What would be the case for a pure translation?
And what if I want to warp an image with the homography matrix that is not from points purely on a plane?
I'm somewhat confused right now and would really really appreciate any explanations!
Thank you in advance!
If there is a nonzero translation, and the image contains more than a plane (or you are not looking at very far away things) then the images are not related by a homography. You can convince yourself that this is the case by noticing that some points visible in one image may be occluded in the other one.
If the image shows a plane among other things, and you estimate a homography using only point correspondences on that plane, than the homography will correctly transform all points on that plane, but will map incorrectly all other points.

Film coordinate to world coordinate

I am working on building 3D point cloud from features matching using OpenCV3.1 and OpenGL.
I have implemented 1) Camera Calibration (Hence I am having Intrinsic Matrix of the camera) 2) Feature extraction( Hence I have 2D points in Pixel Coordinates).
I was going through few websites but generally all have suggested the flow for converting 3D object points to pixel points but I am doing completely backword projection. Here is the ppt that explains it well.
I have implemented film coordinates(u,v) from pixel coordinates(x,y)(With the help of intrisic matrix). Can anyone shed the light on how I can render "Z" of camera coordinate(X,Y,Z) from the film coordinate(x,y).
Please guide me on how I can utilize functions for the desired goal in OpenCV like solvePnP, recoverPose, findFundamentalMat, findEssentialMat.
With single camera and rotating object on fixed rotation platform I would implement something like this:
Each camera has resolution xs,ys and field of view FOV defined by two angles FOVx,FOVy so either check your camera data sheet or measure it. From that and perpendicular distance (z) you can convert any pixel position (x,y) to 3D coordinate relative to camera (x',y',z'). So first convert pixel position to angles:
ax = (x - (xs/2)) * FOVx / xs
ay = (y - (ys/2)) * FOVy / ys
and then compute cartesian position in 3D:
x' = distance * tan(ax)
y' = distance * tan(ay)
z' = distance
That is nice but on common image we do not know the distance. Luckily on such setup if we turn our object than any convex edge will make an maximum ax angle on the sides if crossing the perpendicular plane to camera. So check few frames and if maximal ax detected you can assume its an edge (or convex bump) of object positioned at distance.
If you also know the rotation angle ang of your platform (relative to your camera) Then you can compute the un-rotated position by using rotation formula around y axis (Ay matrix in the link) and known platform center position relative to camera (just subbstraction befor the un-rotation)... As I mention all this is just simple geometry.
In an nutshell:
obtain calibration data
FOVx,FOVy,xs,ys,distance. Some camera datasheets have only FOVx but if the pixels are rectangular you can compute the FOVy from resolution as
FOVx/FOVy = xs/ys
Beware with Multi resolution camera modes the FOV can be different for each resolution !!!
extract the silhouette of your object in the video for each frame
you can subbstract the background image to ease up the detection
obtain platform angle for each frame
so either use IRC data or place known markers on the rotation disc and detect/interpolate...
detect ax maximum
just inspect the x coordinate of the silhouette (for each y line of image separately) and if peak detected add its 3D position to your model. Let assume rotating rectangular box. Some of its frames could look like this:
So inspect one horizontal line on all frames and found the maximal ax. To improve accuracy you can do a close loop regulation loop by turning the platform until peak is found "exactly". Do this for all horizontal lines separately.
btw. if you detect no ax change over few frames that means circular shape with the same radius ... so you can handle each of such frame as ax maximum.
Easy as pie resulting in 3D point cloud. Which you can sort by platform angle to ease up conversion to mesh ... That angle can be also used as texture coordinate ...
But do not forget that you will lose some concave details that are hidden in the silhouette !!!
If this approach is not enough you can use this same setup for stereoscopic 3D reconstruction. Because each rotation behaves as new (known) camera position.
You can't, if all you have is 2D images from that single camera location.
In theory you could use heuristics to infer a Z stacking. But mathematically your problem is under defined and there's literally infinitely many different Z coordinates that would evaluate your constraints. You have to supply some extra information. For example you could move your camera around over several frames (Google "structure from motion") or you could use multiple cameras or use a camera that has a depth sensor and gives you complete XYZ tuples (Kinect or similar).
Update due to comment:
For every pixel in a 2D image there is an infinite number of points that is projected to it. The technical term for that is called a ray. If you have two 2D images of about the same volume of space each image's set of ray (one for each pixel) intersects with the set of rays corresponding to the other image. Which is to say, that if you determine the ray for a pixel in image #1 this maps to a line of pixels covered by that ray in image #2. Selecting a particular pixel along that line in image #2 will give you the XYZ tuple for that point.
Since you're rotating the object by a certain angle θ along a certain axis a between images, you actually have a lot of images to work with. All you have to do is deriving the camera location by an additional transformation (inverse(translate(-a)·rotate(θ)·translate(a)).
Then do the following: Select a image to start with. For the particular pixel you're interested in determine the ray it corresponds to. For that simply assume two Z values for the pixel. 0 and 1 work just fine. Transform them back into the space of your object, then project them into the view space of the next camera you chose to use; the result will be two points in the image plane (possibly outside the limits of the actual image, but that's not a problem). These two points define a line within that second image. Find the pixel along that line that matches the pixel on the first image you selected and project that back into the space as done with the first image. Due to numerical round-off errors you're not going to get a perfect intersection of the rays in 3D space, so find the point where the ray are the closest with each other (this involves solving a quadratic polynomial, which is trivial).
To select which pixel you want to match between images you can use some feature motion tracking algorithm, as used in video compression or similar. The basic idea is, that for every pixel a correlation of its surroundings is performed with the same region in the previous image. Where the correlation peaks is, where it likely was moved from into.
With this pixel tracking in place you can then derive the structure of the object. This is essentially what structure from motion does.

OpenCV Structure from Motion Reprojection Issue

I am currently facing an issue with my Structure from Motion program based on OpenCv.
I'm gonna try to depict you what it does, and what it is supposed to do.
This program lies on the classic "structure from motion" method.
The basic idea is to take a pair of images, detect their keypoints and compute the descriptors of those keypoints. Then, the keypoints matching is done, with a certain number of tests to insure the result is good. That part works perfectly.
Once this is done, the following computations are performed : fundamental matrix, essential matrix, SVD decomposition of the essential matrix, camera matrix computation and finally, triangulation.
The result for a pair of images is a set of 3D coordinates, giving us points to be drawn in a 3D viewer. This works perfectly, for a pair.
Indeed, here is my problem : for a pair of images, the 3D points coordinates are calculated in the coordinate system of the first image of the image pair, taken as the reference image. When working with more than two images, which is the objective of my program, I have to reproject the 3D points computed in the coordinate system of the very first image, in order to get a consistent result.
My question is : How do I reproject 3D points coordinate given in a camera related system, into an other camera related system ? With the camera matrices ?
My idea was to take the 3D point coordinates, and to multiply them by the inverse of each camera matrix before.
I clarify :
Suppose I am working on the third and fourth image (hence, the third pair of images, because I am working like 1-2 / 2-3 / 3-4 and so on).
I get my 3D point coordinates in the coordinate system of the third image, how do I do to reproject them properly in the very first image coordinate system ?
I would have done the following :
Get the 3D points coordinates matrix, apply the inverse of the camera matrix for image 2 to 3, and then apply the inverse of the camera matrix for image 1 to 2.
Is that even correct ?
Because those camera matrices are non square matrices, and I can't inverse them.
I am surely mistaking somewhere, and I would be grateful if someone could enlighten me, I am pretty sure this is a relative easy one, but I am obviously missing something...
Thanks a lot for reading :)
Let us say you have a 3 * 4 extrinsic parameter matrix called P. To match the notations of OpenCV documentation, this is [R|t].
This matrix P describes the projection from world space coordinates to the camera space coordinates. To quote the documentation:
[R|t] translates coordinates of a point (X, Y, Z) to a coordinate system, fixed with respect to the camera.
You are wondering why this matrix is non-square. That is because in the usual context of OpenCV, you are not expecting homogeneous coordinates as output. Therefore, to make it square, just add a fourth row containing (0,0,0,1). Let's call this new square matrix Q.
You have one such matrix for each pair of cameras, that is you have one Qk matrix for each pair of images {k,k+1} that describes the projection from the coordinate space of camera k to that of camera k+1. Those matrices are inversible because they describe isometries in homogeneous coordinates.
To go from the coordinate space of camera 3 to that of camera 1, just apply to your points the inverse of Q2 and then the inverse of Q1.

OpenCV - things to do between stereoCalibrate and collecting 2x2D coordinates

I've calibrated a pair of cameras, I have a loop where markings on a person's legs are being tracked and their positions saved. What now?
Do I have to cv::undistort the images before saving the pixel coordinates? Is there anything else that has to be done before I try to convert these pairs of 2D coordinates to 3D (I don't know how, but that's for later)?
Yes, you need to remove lens distortion before doing triangulation. If you are going to apply cv::undistort on the whole images though, you have to apply it prior to detecting/tracking the points of interest.
If you are not going to compute disparity image and you are only interested in few points, it is more efficient to detect/track the points of interest on the raw image and then apply cv::undistortPoints on these selected points.
After you remove distortion from the points, you can compose the projection matrices for the two cameras and carry out triangulation.

Fourier Angle Transformation of Picture, C++

I need get angle of object on picture, using Fourier transformation.
I have a picture object, that I rotated. There are no problems with the Fourier realization. It shows me good gradient lines that confirm the correct object angle.
Q. How do I get angle points from the Fourier gradient, to transform it horizontally?
You don't show us the image or its transform, so I'll assume the gradient lines are indeed clear enough. In that case you'd use the Hough transform. Output is a set of lines, each with an angle.