I need an algorithm for calculating the body volume (in cubic meters) using kinect.
I know I can extract the cloud and the depth frame (isolating the body by using some methods of the skeleton NUI) but I don't know how to calculate the volume value from this matrix.
Exporting a volume block would be of any help?
If you need to compute the body volume precisely you can use the algorithm for generating Avatars from Kinect for monitoring obesity as it is demonstrated in this video, which shows an example of computing the volume of pregnant women using Kinect. Watch demo video.
The algorithm is described in details in the technical paper: A. Barmpoutis. 'Tensor Body: Real-time Reconstruction of the Human Body and Avatar Synthesis from RGB-D', IEEE Transactions on Cybernetics, Special issue on Computer Vision for RGB-D Sensors: Kinect and Its Applications, October 2013, Vol. 43(5), Pages: 1347-1356. Read PDF.
If you have depth and can determine distance via kinect sensor, and hence height, then you have x, y centimeters dimensions, via depth distance per z delta, and a rough z/2 per pixel/ray cloud-based approximation of depth cm. Keep in mind that human anterior and posterior are asymmetrical (hence the "rough" z/2 approximation - multiplying by 2).
If you can formalize a model of the human form, you can create a fitting algorithm that gives better approximate volume based on the given sensor information.
Related
I am using the University of Michigan AprilTag library for localizing objects and am seeking advice for meeting my localization accuracy goals. I am using a 0.4 MegaPixel camera, on tags that are roughly 7.5 cm wide from distances of 0.1-1.5 meters away. I have used MatLab to calibrate my camera intrinsics and distortion coefficients.
Desired Outcome
I would like to be able to localize tags to within 5 mm accuracy.
Observed Outcome
As I move the camera relative to the tag, the localization results vary. For every 100 cm I move away from the tag, I find drift in the projected location of the tag in the world of about 10cm.
What is a reasonable expectation for the accuracy of the my localization? What actions can I take to reduce the drift I am observing?
If the drift mainly appears in the Z component of the TVEC and the error increases more or less linearly it is a sure sign that the focal length (fx & fy in the camera matrix) of your calibration is off.
Try the following:
check your calibration board: Is the size of the grid correct? Make sure that your printer does not scale the original file
make sure that the calibration board is fixed on a sturdy, flat surface
calibrate again and check if the values of fx and fy have changed (entries (0,0) and (1,1) in the camera matrix).
use at least 50 pictures, vary the board's angle and remove all pictures showing motion blur before calibrating
also check your detection parameters: You can try to activate para.cornerRefinementMethod = cv2.aruco.CORNER_REFINE_APRILTAG to improve corner accuracy (if you are using c++, adjust the command accordingly).
(too long for a comment, so I have to post it as another answer:) This will depend on the pixel size of your sensor and the focal length of your lens (which will "scale" your actual pixel size to a "projected" pixel size). As the effective resolution changes with the distance, a safe estimate would be to use the 1.5 m effective pixel value. In terms of pixels I would not trust marker corner accuracies below 0.3 px as there seems to be an issue with subpixeling accuracy, when rotating the marker (see my open question: Understanding openCV aruco marker detection/pose estimation in detail: subpixel accuracy). Tilting the marker will also degrade the accuracy as the precision of the determined rotation (rvec of the pose) is usually only within a few degrees. If small angles (say e. g. tilted only by 2°) occur, the pose might not reflect that and thus the marker will appear smaller and the distance will thus be over-estimated. In a flat setup (provided you are not using a wide angle lens) you might be able to get the 5 mm accuracy with a sensor > 5 MPx. But taking into account tilt & rotation of the marker, I am not sure if it will suffice...
I am trying to understand Kanade-Lucas-Tomasi tracker.
This is the overview (I read from some lectures) of how it should be done:
1. Find harris corners
2. For each corner compute displacement to next frame
using the Lucas-Kanade method
3. Store displacement of each corner, update corner position
4. (optional) Add more corner points every M frames using 1
5. Repeat 2 to 3 (4)
6. Returns long trajectories for each corner point
My doubt in this is do we need to compute the optical flow at some point or just displacement vector is enough to carry out the algorithm?
If No then why is optical flow dealt with this topic?
Kanade-Lucas-Tomasi tracker is related to optical flow because the displacement vectors are optical flow vectors but in a sparse sense and not a dense optical flow field. Thats because the tracker is based on the Lucas Kanade method estimating the displacement vector. And Lucas Kanade method is based on the intensity constancy assumption which is solved by a first-order Taylor approximimation this approximation is called optical flow equation and was invented by Horn and Schunk. The Lucas Kanade method is classified as a local optical flow method while most of nowadays most optical flow methods are global methods that produce dense motion fields.
The displacement vector is one sample of the (dense) optical flow at the corner position. BTW, although Harris corners are usually good location for initializing the search, strictly speaking you need a weaker detector for that, see Shi & Tomasi's classic "good features to track" paper.
Not the most accurate or helpful description of KLT. This is better:
Find features to track (e.g corners)
For each feature compute displacement to next frame
using the Lucas-Kanade method
Store trajectories of the features tracked
(optional) Add and remove features over time
Returns trajectories for each feature
While #Tobias senst answer is not technically wrong, it misses the very important fact:
Lucas-Kanade method is not restricted to estimating displacements. In fact, using LK in the KLT tracker almost assumes that you go for at least rotation and scaling deformation on top of displacement. For many applications, a full affine solution is beneficial. This means you solve for 6 parameters, not only the 2 parameters of displacement (u,v).
I am using OpenCV's triangulatePoints function to determine 3D coordinates of a point imaged by a stereo camera.
I am experiencing that this function gives me different distance to the same point depending on angle of camera to that point.
Here is a video:
https://www.youtube.com/watch?v=FrYBhLJGiE4
In this video, we are tracking the 'X' mark. In the upper left corner info is displayed about the point that is being tracked. (Youtube dropped the quality, the video is normally much sharper. (2x1280) x 720)
In the video, left camera is the origin of 3D coordinate system and it's looking in positive Z direction. Left camera is undergoing some translation, but not nearly as much as the triangulatePoints function leads to believe. (More info is in the video description.)
Metric unit is mm, so the point is initially triangulated at ~1.94m distance from the left camera.
I am aware that insufficiently precise calibration can cause this behaviour. I have ran three independent calibrations using chessboard pattern. The resulting parameters vary too much for my taste. ( Approx +-10% for focal length estimation).
As you can see, the video is not highly distorted. Straight lines appear pretty straight everywhere. So the optimimum camera parameters must be close to the ones I am already using.
My question is, is there anything else that can cause this?
Can a convergence angle between the two stereo cameras can have this effect? Or wrong baseline length?
Of course, there is always a matter of errors in feature detection. Since I am using optical flow to track the 'X' mark, I get subpixel precision which can be mistaken by... I don't know... +-0.2 px?
I am using the Stereolabs ZED stereo camera. I am not accessing the video frames using directly OpenCV. Instead, I have to use the special SDK I acquired when purchasing the camera. It has occured to me that this SDK I am using might be doing some undistortion of its own.
So, now I wonder... If the SDK undistorts an image using incorrect distortion coefficients, can that create an image that is neither barrel-distorted nor pincushion-distorted but something different altogether?
The SDK provided with the ZED Camera performs undistortion and rectification of images. The geometry model is based on the same as openCV :
intrinsic parameters and distortion parameters for both Left and Right cameras.
extrinsic parameters for rotation/translation between Right and Left.
Through one of the tool of the ZED ( ZED Settings App), you can enter your own intrinsic matrix for Left/Right and distortion coeff, and Baseline/Convergence.
To get a precise 3D triangulation, you may need to adjust those parameters since they have a high impact on the disparity you will estimate before converting to depth.
OpenCV gives a good module to calibrate 3D cameras. It does :
-Mono calibration (calibrateCamera) for Left and Right , followed by a stereo calibration (cv::StereoCalibrate()). It will output Intrinsic parameters (focale, optical center (very important)), and extrinsic (Baseline = T[0], Convergence = R[1] if R is a 3x1 matrix). the RMS (return value of stereoCalibrate()) is a good way to see if the calibration has been done correctly.
The important thing is that you need to do this calibration on raw images, not by using images provided with the ZED SDK. Since the ZED is a standard UVC Camera, you can use opencv to get the side by side raw images (cv::videoCapture with the correct device number) and extract Left and RIght native images.
You can then enter those calibration parameters in the tool. The ZED SDK will then perform the undistortion/rectification and provide the corrected images. The new camera matrix is provided in the getParameters(). You need to take those values when you triangulate, since images are corrected as if they were taken from this "ideal" camera.
hope this helps.
/OB/
There are 3 points I can think of and probably can help you.
Probably the least important, but from your description you have separately calibrated the cameras and then the stereo system. Running an overall optimization should improve the reconstruction accuracy, as some "less accurate" parameters compensate for the other "less accurate" parameters.
If the accuracy of reconstruction is important to you, you need to have a systematic approach to reducing it. Building an uncertainty model, thanks to the mathematical model, is easy and can write a few lines of code to build that for you. Say you want to see if the 3d point is 2 meters away, at a particular angle to the camera system, and you have a specific uncertainty on the 2d projections of the 3d point, it's easy to backproject the uncertainty to the 3d space around your 3d point. By adding uncertainty to the other parameters of the system then you can see which ones are more important and need to have lower uncertainty.
This inaccuracy is inherent in the problem and the method you're using.
First if you model the uncertainty you will see the reconstructed 3d points further away from the center of cameras have a much higher uncertainty. The reason is that the angle <left-camera, 3d-point, right-camera> is narrower. I remember the MVG book had a good description of this with a figure.
Second, if you look at the implementation of triangulatePoints you see that the pseudo-inverse method is implemented using SVD to construct the 3d point. That can lead to many issues, which you probably remember from linear algebra.
Update:
But I consistently get larger distance near edges and several times
the magnitude of the uncertainty caused by the angle.
That's the result of using pseudo-inverse, a numerical method. You can replace that with a geometrical method. One easy method is to back-project the 2d-projections to get 2 rays in 3d space. Then you want to find where the intersect, which doesn't happen due to the inaccuracies. Instead you want to find the point where the 2 rays have the least distance. Without considering the uncertainty you will consistently favor a point from the set of feasible solutions. That's why with pseudo inverse you don't see any fluctuation but a gross error.
Regarding the general optimization, yes, you can run an iterative LM optimization on all the parameters. This is the method used in applications like SLAM for autonomous vehicles where accuracy is very important. You can find some papers by googling bundle adjustment slam.
We have pictures taken from a plane flying over an area with 50% overlap and is using the OpenCV stitching algorithm to stitch them together. This works fine for our version 1. In our next iteration we want to look into a few extra things that I could use a few comments on.
Currently the stitching algorithm estimates the camera parameters. We do have camera parameters and a lot of information available from the plane about camera angle, position (GPS) etc. Would we be able to benefit anything from this information in contrast to just let the algorithm estimate everything based on matched feature points?
These images are taken in high resolution and the algorithm takes up quite amount of RAM at this point, not a big problem as we just spin large machines up in the cloud. But I would like to in our next iteration to get out the homography from down sampled images and apply it to the large images later. This will also give us more options to manipulate and visualize other information on the original images and be able to go back and forward between original and stitched images.
If we in question 1 is going to take apart the stitching algorithm to put in the known information, is it just using the findHomography method to get the info or is there better alternatives to create the homography when we actually know the plane position and angles and the camera parameters.
I got a basic understanding of opencv and is fine with c++ programming so its not a problem to write our own customized stitcher, but the theory is a bit rusty here.
Since you are using homographies to warp your imagery, I assume you are capturing areas small enough that you don't have to worry about Earth curvature effects. Also, I assume you don't use an elevation model.
Generally speaking, you will always want to tighten your (homography) model using matched image points, since your final output is a stitched image. If you have the RAM and CPU budget, you could refine your linear model using a max likelihood estimator.
Having a prior motion model (e.g. from GPS + IMU) could be used to initialize the feature search and match. With a good enough initial estimation of the feature apparent motion, you could dispense with expensive feature descriptor computation and storage, and just go with normalized crosscorrelation.
If I understand correctly, the images are taken vertically and overlap by a known amount of pixels, in that case calculating homography is a bit overkill: you're just talking about a translation matrix, and using more powerful algorithms can only give you bad conditioned matrixes.
In 2D, if H is a generalised homography matrix representing a perspective transformation,
H=[[a1 a2 a3] [a4 a5 a6] [a7 a8 a9]]
then the submatrixes R and T represent rotation and translation, respectively, if a9==1.
R= [[a1 a2] [a4 a5]], T=[[a3] [a6]]
while [a7 a8] represents the stretching of each axis. (All of this is a bit approximate since when all effects are present they'll influence each other).
So, if you known the lateral displacement, you can create a 3x3 matrix having just a3, a6 and a9=1 and pass it to cv::warpPerspective or cv::warpAffine.
As a criteria of matching correctness you can, f.e., calculate a normalized diff between pixels.
How does one determine the appropriate physical size of calibration board for camera calibration ? In the past, I have used ones like these printed on A4 size paper. Reasonable criteria seem to be the precision(i.e. how accurate) and completeness(i.e. are all corners detected) with a particular sized board. But, short of printing multiple sizes and trying them all, is there any other approach ?
On a related note, if we know that the "depth of field" of interest is 2 metres to 2.5 metres from the camera, would it be better to use the calibration pattern within this range or would it be better to try it ignoring such preferences ?
it is a wrong question to ask. Apart from resolution of the board that is going to be probably fine for a wide range of sizes the important question is how many boards you have to show to your camera in order to find accurately intrinsic and extrinsic parameters.
Unlike PnP or posit or extrinsic problem calibration has to find both extrinsic and intrinsic. You will have to cover camera whole field of view with images of your board plus (or rather times) different board orientations and distances. This is because converging to the right intrinsic and extrinsic requires seeing vanishing points while eliminating a bias in pixel error metrics requires uniform sampling of the field of view.
First, see this answer for general notes on manufacturing a calibration target.
Doing a complete sensitivity analysis is quite complicate. A rule of thumb I have used is to assume that my corner detector is accurate up to 1/2 a pixel worst case. Then, given an approximate field of view (which you can estimate from the lens's nominal focal length in mm and the width of the sensor chip), and a specification for the min and max distance used in the application, I worked out the sensitivity of a plane's estimated pose in various orientations and positions in the calibration volume, given a worst case error in its four corners. I then played with its size until the worst case error became acceptable - and within my budget and the other application constraints (weight, depth of field, resolution, etc).