Mismatch between Point Projection and Warped 2-D Image [opencv] - c++

I am using 2 different methods to render an image (as an opencv Matrix):
an implemented projection function that uses the camera intrinsics (focal length, principal point; distortion is disabled) - this function is used in other software packages and is supposed to work correctly (repository)
a 2D to 2D image warping (here, I'm determining the intersections of the corner-rays of my camera with my 2D image that should be warped into my camera frame); this backprojection of the corner points is using the same camera model as above
Now, I overlay these two images and what should basically happen is that a projected pen-tip (method 1.) should line up with a line that is drawn on the warped image (method 2.). However, this is not happening.
There is a tiny shift in both directions, depending on the orientation of the pen that is writing, and it is reduced when I am shifting the principal point of the camera. Now my question is, since I am not considering the principal point in the 2D-2D image warping, can this be the cause of the mismatch? Or is it generally impossible to align those two, since the image warping is a simplification of the projection process?
Grey Point: projected origin (should fall in line with the edges of the white area)
Blue Reticle: penTip that should "write" the Bordeaux-colored line
Grey Line: pen approximation
Red Edge: "x-axis" of white image part
Green Edge: "y-axis" of white image part
EDIT:
I also did the same projection with the origin of the coordinate system, and here, the mismatch grows, the further the origin moves out of the center of the image. (so delta[warp,project] gets larger on the image borders compare to the center)

Related

Projecting 3D points to an undistorted ROI using OpenCV

Assume I have a camera that has been calibrated using the full camera frame to obtain the camera matrix and distortion coefficients. Also, assume that I have a 3D world point expressed in that camera's frame.
I know that I can use cv::projectPoints() with rvec=tvec=(0,0,0), the camera matrix, and distortion coefficients to project the point to the (full) distorted frame. I also know that if I am receiving an ROI from the camera (which is a cropped portion of the full distorted frame), I can adjust for the ROI simply by subtracting the (x,y) coordinate of the top left corner of the ROI from the result of cv::projectPoints(). Lastly, I know that if I use cv::projectPoints() with rvec=tvec=(0,0,0), the camera matrix, and zero distortion coefficients I can project the point to the full undistorted frame (correct me if I'm wrong, but I think this requires that you use the same camera matrix in cv::undistort() and don't use a newCameraMatrix).
How do I handle the case where I want to project to the undistorted version of an ROI that I am receiving (i.e. I get a (distorted) ROI and then use cv::undistort() on it using the method described here to account for the fact that it's an ROI, and then I want to project the 3D point to that resulting image)?
If there is a better way to go about all this I am open to suggestions as well. My goal is that I want to be able to project 3D points to distorted and undistorted frames with or without the presence of an ROI where the ROI is always originally defined by the feed from the camera and therefore always defined in the distorted frame (i.e. 4 different cases: distorted full frame, distorted ROI, undistorted full frame, undistorted version of distorted ROI).

How to find an object's 3D coordinates (triangulation) given two images and camera positions/orientations

I am given
Camera intrinsics: focal length of the pinhole camera in pixels, resolution of the camera in pixels
Camera extrinsics: 3D coordinates (X,Y,Z) of 2 points where pictures of the object were taken, heading of the camera in both positions (rotation, in degrees, from the y axis - the camera is level with the x-y plane) and camera pixel coordinates of the object in each image.
I am not given the rotation and translation matrices for the camera (I have tried figuring these out but I'm confused on how to do so without knowing translation of specific points in the camera frame to 3D coordinate frame).
PS: this is theoretical so I am not able to use OpenCV, etc.
I tried following the process described in this post: How to triangulate a point in 3D space, given coordinate points in 2 image and extrinsic values of the camera
but do not have access to the translation and rotation matrices which all sources I've looked at used.

Invalid cameras calibration for an head mounter Eye Tracking system

I'm working on an Eye Tracking system with two cameras mounted on some kind of glasses. There are optical lenses so that the screen is perceived at around 420 mm from the eye.
From a few dozen pupil samples, we compute two eye models (one for each camera), located in their respective camera coordinates system. This is based on the works here, but modified so that an estimation of the eye center is found using some kind of brute-force approach to minimize the ellipse projection error on the model given its center position in camera space.
Theorically, an approximation of the cameras parameters would be symetrical to the lenses on the Y axis. So every camera should be at the coordinates (around 17.5mm or -17.5, 0, 3.3) with respect to the lenses coordinates system, a rotation of around 42.5 degrees on the Y axis.
With the However, with these values, there is an offset in the result. See below:
The red point is the gaze center estimated by the left eye tracker, the white one is the right eye tracker, in screen coordinates
The screen limits are represented by the white lines.
The green line is the gaze vector, in camera coordinates (projected in 2D for visualization)
The two camera centers found, projected in 2D, are in the middle of the eye (the blue circle).
The pupil samples and current pupils are represented by the ellipses with matching colors.
The offset on x isn't constant which mean the rotation on Y is not exact. and the position of the camera aren't precise too. In order to fix it, we used: this to calibrate and then this to get the rotation parameters from the rotation matrix.
We added a camera on the middle of the lenses (Close to the theorical 0,0,0 point ?) to get the extrinsics and intrinsic parameters of the cameras, relative to our lens center. However, with about 50 checkerboard captures from different positions, the results given by OpenCV doesn't seems correct.
For example, it gives for a camera a position of about (-14,0,10) in lens coordinates for the translation and something like (-2.38, 49, -2.83) as rotation angles in degrees.
The previous screenshots are taken with theses parameters. The theorical ones are a bit further apart, but are more likely to reach the screen borders, unlike the opencv value.
This is probably because the test camera is in front of the optic, not behind, where our real 0,0,0 would be located (we just add the distance at which the screen is perceived on the Z axis afterwards, which is 420mm).
However, we have no way to put the camera in (0, 0, 0).
As the system is compact (everything is captured within a few cm^2), each degree or millimeter can change the result drastically so without the precise value the cameras, we're a bit stuck.
Our objective here is to find an accurate way to get the extrinsic and intrisic parameters of each cameras, so that we can compute a precise position of the center of the eye of the person wearing the glasses, without other calibration procedure than looking around (so no fixation points)
Right now, the system is precise enough so that we get a global indication on where someone is looking on the screen,but there is a divergence between the right and left camera, it's not precise enough. Any advice or hint that could help us is welcome :)

Film coordinate to world coordinate

I am working on building 3D point cloud from features matching using OpenCV3.1 and OpenGL.
I have implemented 1) Camera Calibration (Hence I am having Intrinsic Matrix of the camera) 2) Feature extraction( Hence I have 2D points in Pixel Coordinates).
I was going through few websites but generally all have suggested the flow for converting 3D object points to pixel points but I am doing completely backword projection. Here is the ppt that explains it well.
I have implemented film coordinates(u,v) from pixel coordinates(x,y)(With the help of intrisic matrix). Can anyone shed the light on how I can render "Z" of camera coordinate(X,Y,Z) from the film coordinate(x,y).
Please guide me on how I can utilize functions for the desired goal in OpenCV like solvePnP, recoverPose, findFundamentalMat, findEssentialMat.
With single camera and rotating object on fixed rotation platform I would implement something like this:
Each camera has resolution xs,ys and field of view FOV defined by two angles FOVx,FOVy so either check your camera data sheet or measure it. From that and perpendicular distance (z) you can convert any pixel position (x,y) to 3D coordinate relative to camera (x',y',z'). So first convert pixel position to angles:
ax = (x - (xs/2)) * FOVx / xs
ay = (y - (ys/2)) * FOVy / ys
and then compute cartesian position in 3D:
x' = distance * tan(ax)
y' = distance * tan(ay)
z' = distance
That is nice but on common image we do not know the distance. Luckily on such setup if we turn our object than any convex edge will make an maximum ax angle on the sides if crossing the perpendicular plane to camera. So check few frames and if maximal ax detected you can assume its an edge (or convex bump) of object positioned at distance.
If you also know the rotation angle ang of your platform (relative to your camera) Then you can compute the un-rotated position by using rotation formula around y axis (Ay matrix in the link) and known platform center position relative to camera (just subbstraction befor the un-rotation)... As I mention all this is just simple geometry.
In an nutshell:
obtain calibration data
FOVx,FOVy,xs,ys,distance. Some camera datasheets have only FOVx but if the pixels are rectangular you can compute the FOVy from resolution as
FOVx/FOVy = xs/ys
Beware with Multi resolution camera modes the FOV can be different for each resolution !!!
extract the silhouette of your object in the video for each frame
you can subbstract the background image to ease up the detection
obtain platform angle for each frame
so either use IRC data or place known markers on the rotation disc and detect/interpolate...
detect ax maximum
just inspect the x coordinate of the silhouette (for each y line of image separately) and if peak detected add its 3D position to your model. Let assume rotating rectangular box. Some of its frames could look like this:
So inspect one horizontal line on all frames and found the maximal ax. To improve accuracy you can do a close loop regulation loop by turning the platform until peak is found "exactly". Do this for all horizontal lines separately.
btw. if you detect no ax change over few frames that means circular shape with the same radius ... so you can handle each of such frame as ax maximum.
Easy as pie resulting in 3D point cloud. Which you can sort by platform angle to ease up conversion to mesh ... That angle can be also used as texture coordinate ...
But do not forget that you will lose some concave details that are hidden in the silhouette !!!
If this approach is not enough you can use this same setup for stereoscopic 3D reconstruction. Because each rotation behaves as new (known) camera position.
You can't, if all you have is 2D images from that single camera location.
In theory you could use heuristics to infer a Z stacking. But mathematically your problem is under defined and there's literally infinitely many different Z coordinates that would evaluate your constraints. You have to supply some extra information. For example you could move your camera around over several frames (Google "structure from motion") or you could use multiple cameras or use a camera that has a depth sensor and gives you complete XYZ tuples (Kinect or similar).
Update due to comment:
For every pixel in a 2D image there is an infinite number of points that is projected to it. The technical term for that is called a ray. If you have two 2D images of about the same volume of space each image's set of ray (one for each pixel) intersects with the set of rays corresponding to the other image. Which is to say, that if you determine the ray for a pixel in image #1 this maps to a line of pixels covered by that ray in image #2. Selecting a particular pixel along that line in image #2 will give you the XYZ tuple for that point.
Since you're rotating the object by a certain angle θ along a certain axis a between images, you actually have a lot of images to work with. All you have to do is deriving the camera location by an additional transformation (inverse(translate(-a)·rotate(θ)·translate(a)).
Then do the following: Select a image to start with. For the particular pixel you're interested in determine the ray it corresponds to. For that simply assume two Z values for the pixel. 0 and 1 work just fine. Transform them back into the space of your object, then project them into the view space of the next camera you chose to use; the result will be two points in the image plane (possibly outside the limits of the actual image, but that's not a problem). These two points define a line within that second image. Find the pixel along that line that matches the pixel on the first image you selected and project that back into the space as done with the first image. Due to numerical round-off errors you're not going to get a perfect intersection of the rays in 3D space, so find the point where the ray are the closest with each other (this involves solving a quadratic polynomial, which is trivial).
To select which pixel you want to match between images you can use some feature motion tracking algorithm, as used in video compression or similar. The basic idea is, that for every pixel a correlation of its surroundings is performed with the same region in the previous image. Where the correlation peaks is, where it likely was moved from into.
With this pixel tracking in place you can then derive the structure of the object. This is essentially what structure from motion does.

Opencv Warp perspective whole image

I'm struggling with this problem:
I have the an image and I want to apply a warp perspective to it (I already have the transformation matrix) but instead of the output only having the transformation area (like the example below) I want to be able to see the whole image instead.
EXAMPLE http://docs.opencv.org/trunk/_images/perspective.jpg
Instead of having the transformation region, like this example, I want to transform the whole original image.
How can I achieve this?
Thanks!
It seems that you are computing the perspective transform by selecting the corners of the sudoku grid in the input image and requesting them to be warped at fixed location in the output image. In your example, it seems that you are requesting the top-left corner to be warped at coordinates (0,0), the top-right corner at (300,0), the bottom-left at (0,300) and the bottom-right at (300,300).
This will always result in the cropping of the image area on the left of the two left corners and above the two top corners (i.e. the image area where x<0 or y<0 in the output image). Also, if you specify an output image size of 300x300, this results in the cropping of the image area on the right to the right corners and below the bottom corners.
If you want to keep the whole image, you need to use different output coordinates for the corners. For example warp TLC to (100, 100), TRC to (400,100), BLC to (100,400) and BRC to (400,400), and specify an output image size of 600x600 for instance.
You can also calculate the optimal coordinates as follows:
Compute the default perspective transform H0 (as you are doing now)
Transform the corners of the input image using H0, and compute the minimum and maximum values for the x and y coordinates of these corners. Let's denote them xmin, xmax, ymin, ymax.
Compute the translation necessary to map the point (xmin,ymin) to (0,0). The matrix of this translation is T = [1, 0, -xmin; 0, 1, -ymin; 0, 0, 1].
Compute the optimised perspective transform H1 = T*H0 and specify an output image size of (xmax-xmin) x (ymax-ymin).
This way, you are guaranteed that:
the four corners of your input sudoku grid will form a true square
the output image will be translated so that no useful image data is cropped above or to the left of the grid corners
the output image will be have sized so that no useful image data is cropped below or to the right of the grid corners
However, this will generate black areas since the ouput image is no longer a perfect rectangle, hence some pixels in the output image won't have any correspondence in the input image.
Edit 1: If you want to replace the black areas with something else, you can initialize the destination matrix as you wish and then set the borderMode parameter of the warpPerspective function to BORDER_TRANSPARENT.