estimation of the ground plane in pinhole camera model - computer-vision

I am trying to understand the pinhole camera model and the geometry behind some computer vision and camera calibration stuff that I am looking at.
So, if I understand correctly, the pinhole camera model maps the pixel coordinates to 3D real world coordinates. So, the model looks as:
y = K [R|T]x
Here y is pixel coordinates in homogeneous coordinates, R|T represent the extrinsic transformation matrix and x is the 3D world coordinates also in homogeneous coordinates.
Now, I am looking at a presentation which says
project the center of the focus region onto the ground plane using [R|T]
Now the center of the focus region is just taken to be the center of the image. I am not sure how I can estimate the ground plane? Assuming, the point to be projected is in input space, the projection should that be computed by inverting the [R|T] matrix and multiplying that point by the inverted matrix?
EDIT
Source here on page 29: http://romilbhardwaj.github.io/static/BuildSys_v1.pdf

Related

How to find an object's 3D coordinates (triangulation) given two images and camera positions/orientations

I am given
Camera intrinsics: focal length of the pinhole camera in pixels, resolution of the camera in pixels
Camera extrinsics: 3D coordinates (X,Y,Z) of 2 points where pictures of the object were taken, heading of the camera in both positions (rotation, in degrees, from the y axis - the camera is level with the x-y plane) and camera pixel coordinates of the object in each image.
I am not given the rotation and translation matrices for the camera (I have tried figuring these out but I'm confused on how to do so without knowing translation of specific points in the camera frame to 3D coordinate frame).
PS: this is theoretical so I am not able to use OpenCV, etc.
I tried following the process described in this post: How to triangulate a point in 3D space, given coordinate points in 2 image and extrinsic values of the camera
but do not have access to the translation and rotation matrices which all sources I've looked at used.

OpenCV (C++) - Calculating 2D co-ordinates of an image from known 3D object and camera positions

So I already know the 3D camera position and the position and size of an object in the world frame, as well as the camera matrix and distortion coefficients from a previous camera calibration.
What I need to work out is the 2D image coordinates of the object. Let's just say the object is a sphere with world position objPos and radius objRad, so the image I want to find the coordinates for will be a circle of image position imgPos and radius imgRad.
How would I go about doing this?
Cheers
In OpenCV exists a function to project 3D coordinates on a (camera) image - projectPoints In my opinion you have all you need for calling this function. The arguments are:
3D coordinates you want to project
Rotation of your camera - rvec
Position of your camera - tvec
Camera matrix - from the calibration you have
Camera distortion coefficients - from the calibration you have
Resulting 2D image coordinates
If you have your extrinsic camera parameters in form of 4x4 matrix, you have to extract rvec and tvec from it (see here).
To come to your example case: I would generate the 3D coordinates of such a sphere with the corresponding radius. In a next step, I would project these 3D coordinate with above method.

View-Projection matrix, meaning of Z

I'm projecting 3D points with X,Y,Z model coordinates to X,Y image coordinates using a 4x4 perspective view-projection matrix. There is only one model, so it is like a MVP matrix where the M matrix is unity.
Is it possible to extract the coordinates of the position (in model coordinates) of the camera from the view-projection matrix. (i.e. the translation component of the view-matrix)?
Also, what exactly is the meaning of the Z-component in the projected image coordinates (after division by W)? I know it is between -1 and 1 for points between the near and far planes, but is it possible to deduce the distance of the point to the camera (in model coordinates) from it?

Computer Vision: labelling camera pose

I am trying to create a dataset of images of objects at different poses, where each image is annotated with camera pose (or object pose).
If, for example, I have a world coordinate system and I place the object of interest at the origin and place the camera at a known position (x,y,z) and make it face the origin. Given this information, how can I calculate the pose (rotation matrix) for the camera or for the object.
I had one idea, which was to have a reference coordinate i.e. (0,0,z') where I can define the rotation of the object. i.e. its tilt, pitch and yaw. Then I can calculate the rotation from (0,0,z') and (x,y,z) to give me a rotation matrix. The problem is, how to now combine the two rotation matrices?
BTW, I know the world position of the camera as I am rendering these with OpenGL from a CAD model as opposed to physically moving a camera around.
The homography matrix maps between homogeneous screen coordinates (i,j) to homogeneous world coordinates (x,y,z).
homogeneous coordinates are normal coordinates with a 1 appended. So (3,4) in screen coordinates is (3,4,1) as homogeneous screen coordinates.
If you have a set of homogeneous screen coordinates, S and their associated homogeneous world locations, W. The 4x4 homography matrix satisfies
S * H = transpose(W)
So it boils down to finding several features in world coordinates you can also identify the i,j position in screen coordinates, then doing a "best fit" homography matrix (openCV has a function findHomography)
Whilst knowing the camera's xyz provides helpful info, its not enough to fully constrain the equation and you will have to generate more screen-world pairs anyway. Thus I don't think its worth your time integrating the cameras position into the mix.
I have done a similar experiment here: http://edinburghhacklab.com/2012/05/optical-localization-to-0-1mm-no-problemo/

In OpenCV, converting 2d image point to 3d world unit vector

I have calibrated my camera with OpenCV (findChessboard etc) so I have:
- Camera Distortion Coefficients & Intrinsics matrix
- Camera Pose information (Translation & Rotation, computed separatedly via other means) as Euler Angles & a 4x4
- 2D points within the camera frame
How can I convert these 2D points into 3D unit vectors pointing out into the world? I tried using cv::undistortPoints but that doesn't seem to do it (only returns 2D remapped points), and I'm not exactly sure what method of matrix math to use to model the camera via the Camera intrinsics I have.
Convert your 2d point into a homogenous point (give it a third coordinate equal to 1) and then multiply by the inverse of your camera intrinsics matrix. For example
cv::Matx31f hom_pt(point_in_image.x, point_in_image.y, 1);
hom_pt = camera_intrinsics_mat.inv()*hom_pt; //put in world coordinates
cv::Point3f origin(0,0,0);
cv::Point3f direction(hom_pt(0),hom_pt(1),hom_pt(2));
//To get a unit vector, direction just needs to be normalized
direction *= 1/cv::norm(direction);
origin and direction now define the ray in world space corresponding to that image point. Note that here the origin is centered on the camera, you can use your camera pose to transform to a different origin. Distortion coefficients map from your actual camera to the pinhole camera model and should be used at the very beginning to find your actual 2d coordinate. The steps then are
Undistort 2d coordinate with distortion coefficients
Convert to ray (as shown above)
Move that ray to whatever coordinate system you like.