Face Pose estimation in OpenCV without 3D modelling - c++

I need to estimate the face pose of a person using OpenCV but I can't find any specific functions that is able to do so. I don't want to use 3D modelling as it is out of my project field.
Is there any specific way on doing so? Eg, obtaining the Yaw, Roll, Pitch value of a person?

You can locate some landmarks of the face in your image using haar cascades or AAM, then use some predefined 3d estimation of where those selected points should be in the worldview and leave it to OpenCV SolvePnP to find the pose matrix.
Check out this example

Related

OpenCv Blob tracking of point relative to plane

Am doing an installation that tracks blobs using openCv, and projecting graphics over the blobs. Problem is my camera is off and away from the projector.
I'm thinking to get the point's position in relation to the projection's plane, I would need to calibrate by marking out the plane's corners as seen in the camera view.
My problem is how do i use that 4 points info, and then convert the tracked blob from the camera view to the projection plane, so the projected graphic lines up with the tracked blob? Not sure what i should be searching for.
After you detect the 4 corners points, you can calculate the transformation to the projector plane by using PerspectiveTransform.
Once you have this transformation, you could use warpPerspective, to go from one coordinate system to another.
Unfortunately I'm unable to help with a minimal code example at the moment, but I recommend having a look at ofxCv and it's examples. There is a camera based undistort example, but the wrapper also provides utilities for warping/unwarping perspective via warpPerspective and unwarpPerspective.
Bare in mind ofxCv has handy function to convert to/from ofImage to cv::Mat like toCv() and toOf()
ofxCv may make it easier to use the OpenCV functions Elad Joseph recommends (which sound like exactly what you need)

How to calibrate the Kinect camera?

Camera calibration is the process of estimating intrinsic and/or extrinsic parameters. Intrinsic parameters deal with the camera's internal characteristics, such as, its focal length, skew, distortion, and image center. Extrinsic parameters describe its position and orientation in the world. Knowing intrinsic parameters is an essential first step for 3D computer vision, as it allows you to estimate the scene's structure in Euclidean space and removes lens distortion, which degraces accuracy.
I'm using the Kinect for Computer Vision but I need to calibrate it. I've already read some articles about Kinect calibration but I didn't understand very clearly.
I want to start from nothing. Because I need to know how the calibration is done.
How do I do this?
Thanks.
The Kinect is slightly different than your standard camera. There is a customized toolbox here http://www.ee.oulu.fi/~dherrera/kinect/
I'd suggest your read the paper and try to understand what calibration is.
In very simplistic terms you need calibration so that other geometric algorithms can work. The vast majority of geometry-based algorithms in vision assume the pinhole camera model. That is, the center of the camera is a tiny pinhole and rays reflecting off of objects travel in straight lines.
However, a pinhole camera is not practical to manufacture. You can make a pinhole camera at home, but the image quality won't be good.
People use lenses to deal with this. But, lenses are imperfect and they have distortion.
Distortion means that pixel coordinates do not correspond to straight lines anymore. So, many of the algorithms fail to compute the right thing.
Camera intrinsic calibration corrects the distortion in the lens so that the projection is as close to a pinhole as possible.
The Kinect has two cameras. The RGB camera, and the IR camera. Both are factory calibrated, but you can get better results customized for the sensor you use using the toolbox above.
HTH
Thulio, for calibrating the color camera of Kinect have a look at:
camera calibration toolbox. Basically you need to print a chessboard pattern on paper, glue it on a planar surface, take a lot of Kinect color pictures and load it with the toolbox to get your camera parameters. I suspect that somebody else may have done that before you (most Kinects will have the same intrinsic parameters, I guess).

How to verify that the camera calibration is correct? (or how to estimate the error of reprojection)

The quality of calibration is measured by the reprojection error (is there an alternative?), which requires a knowledge world coordinates of some 3d point(s).
Is there a simple way to produce such known points? Is there a way to verify the calibration in some other way (for example, Zhang's calibration method only requires that the calibration object be planar and the geometry of the system need not to be known)
You can verify the accuracy of the estimated nonlinear lens distortion parameters independently of pose. Capture images of straight edges (e.g. a plumb line, or a laser stripe on a flat surface) spanning the field of view (an easy way to span the FOV is to rotate the camera keeping the plumb line fixed, then add all the images). Pick points on said line images, undistort their coordinates, fit mathematical lines, compute error.
For the linear part, you can also capture images of multiple planar rigs at a known relative pose, either moving one planar target with a repeatable/accurate rig (e.g. a turntable), or mounting multiple planar targets at known angles from each other (e.g. three planes at 90 deg from each other).
As always, a compromise is in order between accuracy requirements and budget. With enough money and a friendly machine shop nearby you can let your fantasy run wild with rig geometry. I had once a dodecahedron about the size of a grapefruit, machined out of white plastic to 1/20 mm spec. Used it to calibrate the pose of a camera on the end effector of a robotic arm, moving it on a sphere around a fixed point. The dodecahedron has really nice properties in regard to occlusion angles. Needless to say, it's all patented.
The images used in generating the intrinsic calibration can also be used to verify it. A good example of this is the camera-calib tool from the Mobile Robot Programming Toolkit (MRPT).
Per Zhang's method, the MRPT calibration proceeds as follows:
Process the input images:
1a. Locate the calibration target (extract the chessboard corners)
1b. Estimate the camera's pose relative to the target, assuming that the target is a planar chessboard with a known number of intersections.
1c. Assign points on the image to a model of the calibration target in relative 3D coordinates.
Find an intrinsic calibration that best explains all of the models generated in 1b/c.
Once the intrinsic calibration is generated, we can go back to the source images.
For each image, multiply the estimated camera pose with the intrinsic calibration, then apply that to each of the points derived in 1c.
This will map the relative 3D points from the target model back to the 2D calibration source image. The difference between the original image feature (chessboard corner) and the reprojected point is the calibration error.
MRPT performs this test on all input images and will give you an aggregate reprojection error.
If you want to verify a full system, including both the camera intrinsics and the camera-to-world transform, you will probably need to build a jig that places the camera and target in a known configuration, then test calculated 3D points against real-world measurements.
On Engine's question: the pose matrix is a [R|t] matrix where R is a pure 3D rotation and t a translation vector. If you have computed a homography from the image, section 3.1 of Zhang's Microsoft Technical Report (http://research.microsoft.com/en-us/um/people/zhang/Papers/TR98-71.pdf) gives a closed form method to obtain both R and t using the known homography and the intrinsic camera matrix K. ( I can't comment, so I added as a new answer)
Should be just variance and bias in calibration (pixel re-projection) errors given enough variability in calibration rig poses. It is better to visualize these errors rather than to look at the values. For example, error vectors pointing to the center would be indicative of wrong focal length. Observing curved lines can give intuition about distortion coefficients.
To calibrate the camera one has to jointly solve for extrinsic and intrinsic. The latter can be known from manufacturer, the solving for extrinsic (rotation and translation) involves decomposition of calculated homography: Decompose Homography matrix in opencv python
Calculate a Homography with only Translation, Rotation and Scale in Opencv
The homography is used here since most calibration targets are flat.

Reconstructing 3D from some images without calibration?

I want to make a 3D reconstruction from multiple images without using a chessboard Calibration. I'm using OpenCV and studying the method to obtain the way to get the model 3D from 30 images without calibrating the camera with a chessboard pattern.
Is this possible? Where can I get the extrinsics params?
Can I make the 3D reconstruction without calibrating?
The calibration grid (chessboard in the typical OpenCV example) is simply an object of known dimensions that lets you estimate the camera's intrinsic parameters, i.e. the mapping from camera coordinates to the image coordinates of a point. This includes focal length, centre of projection, radial distortion parameters et cetera.
If you do away with the calibration object, you will need to find these parameters from the image observations themselves. This approach is called "self-calibration" or "auto-calibration" and can be fairly involved. Basically, you are trying to get a good starting point for the follow-up non-linear optimisation (i.e. bundle adjustment). For a start, you might want to refer to Marc Pollefeys' PhD thesis, who came up with a simple linear algorithm for this problem:
http://www.cs.unc.edu/~marc/pubs/PollefeysIJCV04.pdf

Rigid motion estimation

Now what I have is the 3D point sets as well as the projection parameters of the camera. Given two 2D point sets projected from the 3D point by using the camera and transformed camera(by rotation and translation), there should be an intuitive way to estimate the camera motion...I read some parts of Zisserman's book "Muliple view Geometry in Computer Vision", but I still did not get the solution..
Are there any hints, how can the rigid motion be estimated in this case?
THANKS!!
What you are looking for is a solution to the PnP problem. OpenCV has a function which should work called solvePnP. Just to be clear, for this to work you need point locations in world space, a camera matrix, and the points projections onto the image plane. It will then tell you the rotation and translation of the camera or points depending on how you choose to think of it.
Adding to the previous answer, Eigen has an implementation of Umeyama's method for estimation of the rigid transformation between two sets of 3d points. You can use it to get an initial estimation, and then refine it using an optimization algorithm and considering the projections of the 3d points onto the images too. For example, you could try to minimize the reprojection error between 2d points on the first image and projections of the 3d points after you bring them from the reference frame of one camera to the the reference frame of the other using the previously estimated transformation. You can do this in both ways, using the transformation and its inverse, and try to minimize the bidirectional reprojection error. I'd recommend the paper "Stereo visual odometry for autonomous ground robots", by Andrew Howard, as well as some of its references for a better explanation, especially if you are considering an outlier removal/inlier detection step before the actual motion estimation.