Pose estimation with known initial scale - computer-vision

Suppose you have a 3d scene with depth information and you estimate a 2d homography between the initial pose and another pose. How do you integrate the known scale from the scene into pose estimation to recover the absolute 3d position?

Related

Unstable values in ArUco pose estimation

I'm trying to find the orientation of the camera using Aruco marker. Euler angles extracted from the rotation matrix are unstable beyond a certain point.
As the distance of the camera increases from the marker, the yaw angle values of camera is just unstable. The "Z" axis on the marker flips.
The euler angles are jittery, not the same in every frame and take time to stabilize. How do I obtain some reliable values of the yaw angle and distance between the camera and marker?
I am trying to find the pose of moving camera w.r.t a static marker.
I implemented solvePnP and solvePnPRansac both yielding in unstable results.
The rotation matrix obtained after converting rotation vectors from estimatePoseSingleMarker seems alright up to a certain point but loses stability.
How do I go about this?
Thank you
In general, you won't get accurate camera pose estimation from a single marker. The solution is to add more markers. You could use either a marker board, or a more sparse pattern of markers.
As a single marker gets further from the camera, several factors work to reduce the accuracy of the marker pose estimate.
the projected size of the marker becomes smaller and more quantized by the pixel grid. Distance is estimated by inverse perspective division, so it becomes less accurate as distance increases.
perspective distortion reduces, approaching a parallel projection. In a parallel projection the marker has two equally viable orientations, which may be returned alternately (see https://en.wikipedia.org/wiki/Necker_cube). The orientation of the marker relative to the camera is also significant - in more perpendicular views of the marker (orthographic projection), pitch and yaw of the marker are ambiguous, compared to oblique views. Reduced perspective distortion with distance makes this effect worse, and will cause the calculated camera pose to yaw, pitch, and move laterally.
given the smaller number of pixels in the marker, small scale effects such as sensor noise and quantization become more significant, reducing stability from frame to frame and causing jitter.
As you have discovered, pose estimation works OK in close-up, oblique views of a single marker, because the projected points given to solvePnP() are far apart and have large perspective distortion. By adding more markers, you always have ideal projected points for solvePnP().

Measure real size of object with Calibrated Camera opencv c++?

i am completing my thesis related opencv.
I want to measure real size of object (mm) with single camera but i have problem with convert the camera's natural units (pixels) and the real world units!!!
After calibrate camera, i have:
Camera matrix (3x3)
Distortion coefficients
Extrinsic parameters [rotation vector(1x3) + translation vector(1x3)]
I have read following link but i can't find out formula to convert unit.
https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
Example about measure size of object
Any sugguestion???
Thanks so much.
As mentioned in the comments, you need the distance to the object to obtain 3D coordinates from pixels. A possible workflow would be:
Rectify the image using the distortion parameters, i.e., correct the distortion caused by the camera.
Deproject the pixels into 3D points in the camera coordinate frame using the camera matrix. For this you can multiply the inverse of the 3x3 camera matrix with a vector containing the pixels [pixel_x, pixel_y, 1]^T. If you multiply the result [x', y', 1]^T with the depth, i.e., the z-component you obtain the 3D point in the camera coordinate frame.
Transform the point from the camera coordinate frame into the world coordinate frame using the extrinsics parameters.
Obtaining the depth values from an image alone is not possible. The only option is to use some additional information. Maybe your object is placed on a table and you know the distance between the camera and the table.
To measure distances between the camera and a table or even the object itself you could use Aruco markers, which are also available within openCV.

reconstructing position from depth using opencv

I have a depth image, point cloud and I have 2D corner points that are computed using OpenCV feature, how would I compute the 3D positions of these corners using depth image and I have the camera internsics parameters ?
Your question seems related to this answer
[Extracting 3D coordinates given 2D image points, depth map and camera calibration matrices
Here, you have a direct correspondence from 2D and 3D (actually 2.5D) values. By undistorting both, the depth and the 2D image, you can use the focal length and the measured depth to do the inverse mapping from 2D pixels to 3D world coordinates.

Calculate Camera Position from Homography Decomposition

I have a reference image A with a known position and I want to calculate the relative position of the camera at image B (i.e. tx, ty, tz in meters). The images are taken with the same camera so the camera matrix stays the same. I'm using SIFT to detect and compute the keypoints and descriptors in both images and match them with FLANN. From there I can get the homography matrix which I decompose with cv::decomposeHomography(..). This function is based on this paper: PDF.
In this paper it is stated, that the translation matrix is normalized by d*, which is the plane depth.
In order to get the correct translation I need to know the plane depth. Is there a way to get this without knowing the size of an object found in the image?
The 3D translation computed using homography decomposition is only computable up to an unknown scale factor. This is a classical problem with computing 3D geometry from monocular images using only apparent motion in the images. Typically 3D reconstructions from monocular images are called metric reconstructions for this reason (rather than Euclidean reconstructions where scale is resolved). To resolve the scale factor some more information is needed, such as knowing the depth of a point on the plane or the distance moved by the camera between images.

Projecting a 2D point into 3D space using camera calibration parameters in OpenCV

I get my camera parameters using calibrateCamera() and now I have cameraMatrix, distCoeffs, rotationMatrix, transformMatrix.
With these matrices I can build the Projection Matrix and convert 3D object points in the space into 2D image points.
Some thing like this:
But what I want is the reverse of this projection. I want to convert these 2D points back into 3D space. I know I'll lost some information, but all of my original points were in a same plan.
Please help me to build a similar matrix by using camera parameters for this convertion.
From a set of projected 2D point you cannot get back the original 3D points, but the rays that join the 3D points and their projections in the image plane. So, you lose the depth of the 3D points; it is to say, you know the orientation but you don't have the distance from the camera to the 3D points.
You will have to make up the depth of the 3D points. Their planar condition allows you to make some constraints in their relative positions, but it is not enough to retrieve their original depth.
For example, you can set the depth of 3 non-collinear points to create a plane in the 3D space. The depth of the other 2D points will be given by the intersection of those rays with this new plane.
If you know the normal vector to the plane that originated the 3D points, you could do something similar just by setting the depth of a single 2D point and computing the others accordingly. Only if you have, in addition, the distance from that plane to the origin (to the camera), you can retrieve the real depth of your 3D points.