How to improve Aruco Marker Pose Estimation?

How to improve Aruco Marker Pose Estimation? - computer-vision

I'm having a hard time estimating the positions of the Aruco markers with the camera. In my tests with the DICT_6X6_250 dictionary and the board with 4 markers of 20x20 cm on it, I measured at 6 meters with an error of 20-30 cm. I need more precise measurements.
Is this error rate normal? What can I do to increase accuracy?

In general there are ambiguity issues with Aruco, which you can find here.
I am doing abit of research on Fiducial Markers and this error rate is pretty normal. The Pose estimation of the markers tend to have errors in x and y rotation and z Translation.
However, there are some factors that can influence the accuracy of Aruco Pose estimation. Here are some points, that can help improve Pose estimation accuracy, which you should take into consideration:
The first is to use a Camara with a high resolution. If the Marker is small in the image plane the pose estimation will not be as accurate.
Secondly instead of using cv2.aruco.estimatePoseSingleMarkers()
I would recommend using cv2.SolvePnP() as it allows you to use different Perspective N Point algorithms to calculate the Pose. You can read more about SolvePnP here and the different methods here
For the Aruco Detection cv2.detectMarkers() use a SubPixel Corner refinement method.
Lastly you can use a Pose Refinement Method to improve the estimated pose (here). This method reduces the reprojection error of the estimated Pose and as a result you should get better Pose estimation accuracy.

Inaccuracies of pose can stem from inaccuracies in subpixel localization.
Almost all algorithms for subpixel localization, and all people, assume a linear relationship between what's physically there (edges, corners) and how that is mapped to pixel intensities.
Webcams1 give gamma-compressed data, not raw linear sensor values. Also, webcams love to "sharpen" the picture. Both affect subpixel localization.
1 Not unique to webcams. That goes for everything that isn't a "raw sensor" file format. All JPEG, PNG, ... images are gamma-compressed, and so are all videos.

Related

AprilTag Localization Expected Acuracy

I am using the University of Michigan AprilTag library for localizing objects and am seeking advice for meeting my localization accuracy goals. I am using a 0.4 MegaPixel camera, on tags that are roughly 7.5 cm wide from distances of 0.1-1.5 meters away. I have used MatLab to calibrate my camera intrinsics and distortion coefficients.
Desired Outcome
I would like to be able to localize tags to within 5 mm accuracy.
Observed Outcome
As I move the camera relative to the tag, the localization results vary. For every 100 cm I move away from the tag, I find drift in the projected location of the tag in the world of about 10cm.
What is a reasonable expectation for the accuracy of the my localization? What actions can I take to reduce the drift I am observing?

If the drift mainly appears in the Z component of the TVEC and the error increases more or less linearly it is a sure sign that the focal length (fx & fy in the camera matrix) of your calibration is off.
Try the following:
check your calibration board: Is the size of the grid correct? Make sure that your printer does not scale the original file
make sure that the calibration board is fixed on a sturdy, flat surface
calibrate again and check if the values of fx and fy have changed (entries (0,0) and (1,1) in the camera matrix).
use at least 50 pictures, vary the board's angle and remove all pictures showing motion blur before calibrating
also check your detection parameters: You can try to activate para.cornerRefinementMethod = cv2.aruco.CORNER_REFINE_APRILTAG to improve corner accuracy (if you are using c++, adjust the command accordingly).

(too long for a comment, so I have to post it as another answer:) This will depend on the pixel size of your sensor and the focal length of your lens (which will "scale" your actual pixel size to a "projected" pixel size). As the effective resolution changes with the distance, a safe estimate would be to use the 1.5 m effective pixel value. In terms of pixels I would not trust marker corner accuracies below 0.3 px as there seems to be an issue with subpixeling accuracy, when rotating the marker (see my open question: Understanding openCV aruco marker detection/pose estimation in detail: subpixel accuracy). Tilting the marker will also degrade the accuracy as the precision of the determined rotation (rvec of the pose) is usually only within a few degrees. If small angles (say e. g. tilted only by 2°) occur, the pose might not reflect that and thus the marker will appear smaller and the distance will thus be over-estimated. In a flat setup (provided you are not using a wide angle lens) you might be able to get the 5 mm accuracy with a sensor > 5 MPx. But taking into account tilt & rotation of the marker, I am not sure if it will suffice...

TOF camera calibration - distance to chessboard

For my application, I need to calibrate my TOF-camera (kinect v2).
I have done this with matlab camera calibration. After my calibration, I recognized that right-angled planes are oblique.
For example here a result of two "right-angled" planes:
I think this result is so oblique cause of the wrong parameters from calibration. Therefore I want to improve my calibration process of the kinect.
So I have three major questions:
Is the distance between TOF-camera and chessboard important for the calibration result? For my application, I need a quite high accurancy in the interval 2.5m - 3m (Z-distance between camera and object). So I choose this intervall to get the best result for this area, expecialy because it is a TOF-camera. Or should I take a quite short distance (1-1.5m) to get a good chessboard with a high resolution?
What kind of images (viewpoint: rotated,obiquely / images in the middle or corner) are important to get a good result for the tangential distortion ( I think this parameter turns totataly wrong)? Any tipps to improve here my results?
I fixed my chessboard on a flat wall and fixed my camera on a tripod. For different calibration images I move my tripod. Would this procedure be also ok? Or do I have to move the chessboard pattern to get better results?

OpenCV triangulatePoints varying distance

I am using OpenCV's triangulatePoints function to determine 3D coordinates of a point imaged by a stereo camera.
I am experiencing that this function gives me different distance to the same point depending on angle of camera to that point.
Here is a video:
https://www.youtube.com/watch?v=FrYBhLJGiE4
In this video, we are tracking the 'X' mark. In the upper left corner info is displayed about the point that is being tracked. (Youtube dropped the quality, the video is normally much sharper. (2x1280) x 720)
In the video, left camera is the origin of 3D coordinate system and it's looking in positive Z direction. Left camera is undergoing some translation, but not nearly as much as the triangulatePoints function leads to believe. (More info is in the video description.)
Metric unit is mm, so the point is initially triangulated at ~1.94m distance from the left camera.
I am aware that insufficiently precise calibration can cause this behaviour. I have ran three independent calibrations using chessboard pattern. The resulting parameters vary too much for my taste. ( Approx +-10% for focal length estimation).
As you can see, the video is not highly distorted. Straight lines appear pretty straight everywhere. So the optimimum camera parameters must be close to the ones I am already using.
My question is, is there anything else that can cause this?
Can a convergence angle between the two stereo cameras can have this effect? Or wrong baseline length?
Of course, there is always a matter of errors in feature detection. Since I am using optical flow to track the 'X' mark, I get subpixel precision which can be mistaken by... I don't know... +-0.2 px?
I am using the Stereolabs ZED stereo camera. I am not accessing the video frames using directly OpenCV. Instead, I have to use the special SDK I acquired when purchasing the camera. It has occured to me that this SDK I am using might be doing some undistortion of its own.
So, now I wonder... If the SDK undistorts an image using incorrect distortion coefficients, can that create an image that is neither barrel-distorted nor pincushion-distorted but something different altogether?

The SDK provided with the ZED Camera performs undistortion and rectification of images. The geometry model is based on the same as openCV :
intrinsic parameters and distortion parameters for both Left and Right cameras.
extrinsic parameters for rotation/translation between Right and Left.
Through one of the tool of the ZED ( ZED Settings App), you can enter your own intrinsic matrix for Left/Right and distortion coeff, and Baseline/Convergence.
To get a precise 3D triangulation, you may need to adjust those parameters since they have a high impact on the disparity you will estimate before converting to depth.
OpenCV gives a good module to calibrate 3D cameras. It does :
-Mono calibration (calibrateCamera) for Left and Right , followed by a stereo calibration (cv::StereoCalibrate()). It will output Intrinsic parameters (focale, optical center (very important)), and extrinsic (Baseline = T[0], Convergence = R[1] if R is a 3x1 matrix). the RMS (return value of stereoCalibrate()) is a good way to see if the calibration has been done correctly.
The important thing is that you need to do this calibration on raw images, not by using images provided with the ZED SDK. Since the ZED is a standard UVC Camera, you can use opencv to get the side by side raw images (cv::videoCapture with the correct device number) and extract Left and RIght native images.
You can then enter those calibration parameters in the tool. The ZED SDK will then perform the undistortion/rectification and provide the corrected images. The new camera matrix is provided in the getParameters(). You need to take those values when you triangulate, since images are corrected as if they were taken from this "ideal" camera.
hope this helps.
/OB/

There are 3 points I can think of and probably can help you.
Probably the least important, but from your description you have separately calibrated the cameras and then the stereo system. Running an overall optimization should improve the reconstruction accuracy, as some "less accurate" parameters compensate for the other "less accurate" parameters.
If the accuracy of reconstruction is important to you, you need to have a systematic approach to reducing it. Building an uncertainty model, thanks to the mathematical model, is easy and can write a few lines of code to build that for you. Say you want to see if the 3d point is 2 meters away, at a particular angle to the camera system, and you have a specific uncertainty on the 2d projections of the 3d point, it's easy to backproject the uncertainty to the 3d space around your 3d point. By adding uncertainty to the other parameters of the system then you can see which ones are more important and need to have lower uncertainty.
This inaccuracy is inherent in the problem and the method you're using.
First if you model the uncertainty you will see the reconstructed 3d points further away from the center of cameras have a much higher uncertainty. The reason is that the angle <left-camera, 3d-point, right-camera> is narrower. I remember the MVG book had a good description of this with a figure.
Second, if you look at the implementation of triangulatePoints you see that the pseudo-inverse method is implemented using SVD to construct the 3d point. That can lead to many issues, which you probably remember from linear algebra.
Update:
But I consistently get larger distance near edges and several times
the magnitude of the uncertainty caused by the angle.
That's the result of using pseudo-inverse, a numerical method. You can replace that with a geometrical method. One easy method is to back-project the 2d-projections to get 2 rays in 3d space. Then you want to find where the intersect, which doesn't happen due to the inaccuracies. Instead you want to find the point where the 2 rays have the least distance. Without considering the uncertainty you will consistently favor a point from the set of feasible solutions. That's why with pseudo inverse you don't see any fluctuation but a gross error.
Regarding the general optimization, yes, you can run an iterative LM optimization on all the parameters. This is the method used in applications like SLAM for autonomous vehicles where accuracy is very important. You can find some papers by googling bundle adjustment slam.

How to verify that the camera calibration is correct? (or how to estimate the error of reprojection)

The quality of calibration is measured by the reprojection error (is there an alternative?), which requires a knowledge world coordinates of some 3d point(s).
Is there a simple way to produce such known points? Is there a way to verify the calibration in some other way (for example, Zhang's calibration method only requires that the calibration object be planar and the geometry of the system need not to be known)

You can verify the accuracy of the estimated nonlinear lens distortion parameters independently of pose. Capture images of straight edges (e.g. a plumb line, or a laser stripe on a flat surface) spanning the field of view (an easy way to span the FOV is to rotate the camera keeping the plumb line fixed, then add all the images). Pick points on said line images, undistort their coordinates, fit mathematical lines, compute error.
For the linear part, you can also capture images of multiple planar rigs at a known relative pose, either moving one planar target with a repeatable/accurate rig (e.g. a turntable), or mounting multiple planar targets at known angles from each other (e.g. three planes at 90 deg from each other).
As always, a compromise is in order between accuracy requirements and budget. With enough money and a friendly machine shop nearby you can let your fantasy run wild with rig geometry. I had once a dodecahedron about the size of a grapefruit, machined out of white plastic to 1/20 mm spec. Used it to calibrate the pose of a camera on the end effector of a robotic arm, moving it on a sphere around a fixed point. The dodecahedron has really nice properties in regard to occlusion angles. Needless to say, it's all patented.

The images used in generating the intrinsic calibration can also be used to verify it. A good example of this is the camera-calib tool from the Mobile Robot Programming Toolkit (MRPT).
Per Zhang's method, the MRPT calibration proceeds as follows:
Process the input images:
1a. Locate the calibration target (extract the chessboard corners)
1b. Estimate the camera's pose relative to the target, assuming that the target is a planar chessboard with a known number of intersections.
1c. Assign points on the image to a model of the calibration target in relative 3D coordinates.
Find an intrinsic calibration that best explains all of the models generated in 1b/c.
Once the intrinsic calibration is generated, we can go back to the source images.
For each image, multiply the estimated camera pose with the intrinsic calibration, then apply that to each of the points derived in 1c.
This will map the relative 3D points from the target model back to the 2D calibration source image. The difference between the original image feature (chessboard corner) and the reprojected point is the calibration error.
MRPT performs this test on all input images and will give you an aggregate reprojection error.
If you want to verify a full system, including both the camera intrinsics and the camera-to-world transform, you will probably need to build a jig that places the camera and target in a known configuration, then test calculated 3D points against real-world measurements.

On Engine's question: the pose matrix is a [R|t] matrix where R is a pure 3D rotation and t a translation vector. If you have computed a homography from the image, section 3.1 of Zhang's Microsoft Technical Report (http://research.microsoft.com/en-us/um/people/zhang/Papers/TR98-71.pdf) gives a closed form method to obtain both R and t using the known homography and the intrinsic camera matrix K. ( I can't comment, so I added as a new answer)

Should be just variance and bias in calibration (pixel re-projection) errors given enough variability in calibration rig poses. It is better to visualize these errors rather than to look at the values. For example, error vectors pointing to the center would be indicative of wrong focal length. Observing curved lines can give intuition about distortion coefficients.

To calibrate the camera one has to jointly solve for extrinsic and intrinsic. The latter can be known from manufacturer, the solving for extrinsic (rotation and translation) involves decomposition of calculated homography: Decompose Homography matrix in opencv python
Calculate a Homography with only Translation, Rotation and Scale in Opencv
The homography is used here since most calibration targets are flat.

web cam calibrate

I have 2 logistic webcam, I want to do stereo triangulation for which I have to measure the focal length of 2 web cameras.
My question is if I use openCv to calibrate the camera and generate the intrinsic and extrinsic matrices can I use the focal length value that is generated in the intrinsic matrix
as the exact value of focal length.
Well in short i wanted to know if I can use 2 webcams to do stereo triangulation rather than using pin whole stereo camera...

Well, the answer depends on what you mean by the exact value of focal length. If the accuracy of triangulation is your concern then you need to know there are a few factors that affect the accuracy of calibration and triangulation. The rule of thumb is to have a wider baseline (the distance between the two cameras) to improve the accuracy of calibration. Second a larger number of points and more accurate points should be used for calibration. Third, check out the the back projection error after running bundle adjustment. Four, when triangulating the points further from the cameras have a larger uncertainty. And finally, apart from the first points, the wide baseline, the relative pose between the two camera is very important as you should consider what points you want to triangulate and relatively where in the 3D space they should be, then you can reconstruct some points that are important to you more accurate than the others. If you provide more details about the problem you're dealing with perhaps you get a more detailed answer too. I hope that helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js