how do I re-project points in a camera - projector system (after calibration) - c++

i have seen many blog entries and videos and source coude on the internet about how to carry out camera + projector calibration using openCV, in order to produce the camera.yml, projector.yml and projectorExtrinsics.yml files.
I have yet to see anyone discussing what to do with this files afterwards. Indeed I have done a calibration myself, but I don't know what is the next step in my own application.
Say I write an application that now uses the camera - projector system that I calibrated to track objects and project something on them. I will use contourFind() to grab some points of interest from the moving objects and now I want to project these points (from the projector!) onto the objects!
what I want to do is (for example) track the centre of mass (COM) of an object and show a point on the camera view of the tracked object (at its COM). Then a point should be projected on the COM of the object in real time.
It seems that projectPoints() is the openCV function I should use after loading the yml files, but I am not sure how I will account for all the intrinsic & extrinsic calibration values of both camera and projector. Namely, projectPoints() requires as parameters the
vector of points to re-project (duh!)
rotation + translation matrices. I think I can use the projectorExtrinsics here. or I can use the composeRT() function to generate a final rotation & a final translation matrix from the projectorExtrinsics (which I have in the yml file) and the cameraExtrinsics (which I don't have. side question: should I not save them too in a file??).
intrinsics matrix. this tricky now. should I use the camera or the projector intrinsics matrix here?
distortion coefficients. again should I use the projector or the camera coefs here?
other params...
So If I use either projector or camera (which one??) intrinsics + coeffs in projectPoints(), then I will only be 'correcting' for one of the 2 instruments . Where / how will I use the other's instruments intrinsics ??
What else do I need to use apart from load() the yml files and projectPoints() ? (perhaps undistortion?)
ANY help on the matter is greatly appreciated .
If there is a tutorial or a book (no, O'Reilly "Learning openCV" does not talk about how to use the calibration yml files either! - only about how to do the actual calibration), please point me in that direction. I don't necessarily need an exact answer!

First, you seem to be confused about the general role of a camera/projector model: its role is to map 3D world points to 2D image points. This sounds obvious, but this means that given extrinsics R,t (for orientation and position), distortion function D(.) and intrisics K, you can infer for this particular camera the 2D projection m of a 3D point M as follows: m = K.D(R.M+t). The projectPoints function does exactly that (i.e. 3D to 2D projection), for each input 3D point, hence you need to give it the input parameters associated to the camera in which you want your 3D points projected (projector K&D if you want projector 2D coordinates, camera K&D if you want camera 2D coordinates).
Second, when you jointly calibrate your camera and projector, you do not estimate a set of extrinsics R,t for the camera and another for the projector, but only one R and one t, which represent the rotation and translation between the camera's and projector's coordinate systems. For instance, this means that your camera is assumed to have rotation = identity and translation = zero, and the projector has rotation = R and translation = t (or the other way around, depending on how you did the calibration).
Now, concerning the application you mentioned, the real problem is: how do you estimate the 3D coordinates of a given point ?
Using two cameras and one projector, this would be easy: you could track the objects of interest in the two camera images, triangulate their 3D positions using the two 2D projections using function triangulatePoints and finally project this 3D point in the projector 2D coordinates using projectPoints in order to know where to display things with your projector.
With only one camera and one projector, this is still possible but more difficult because you cannot triangulate the tracked points from only one observation. The basic idea is to approach the problem like a sparse stereo disparity estimation problem. A possible method is as follows:
project a non-ambiguous image (e.g. black and white noise) using the projector, in order to texture the scene observed by the camera.
as before, track the objects of interest in the camera image
for each object of interest, correlate a small window around its location in the camera image with the projector image, in order to find where it projects in the projector 2D coordinates
Another approach, which unlike the one above would use the calibration parameters, could be to do a dense 3D reconstruction using stereoRectify and StereoBM::operator() (or gpu::StereoBM_GPU::operator() for the GPU implementation), map the tracked 2D positions to 3D using the estimated scene depth, and finally project into the projector using projectPoints.
Anyhow, this is easier, and more accurate, using two cameras.
Hope this helps.

Related

Augmented Reality OpenGL+OpenCV

I am very new to OpenCV with a limited experience on OpenGL. I am willing to overlay a 3D object on a calibrated image of a checkerboard. Any tips or guidance?
The basic idea is that you have 2 cameras: one is the physical one (the one where you are retriving the images with opencv) and one is the opengl one. You have to align those two matrices.
To do that, you need to calibrate the physical camera.
First. You need a distortion parameters (because every lens more or less has some optical distortion), and build with those parameters the so called intrinsic parameters. You do this with printing a chessboard in a paper, using it for get some images and calibrate the camera. It's full of nice tutorial about that on the internet, and from your answer it seems you have them. That's nice.
Then. You have to calibrate the position of the camera. And this is done with the so called extrinsic parameters. Those parameters encoded the position and the rotation the the 3D world of those camera.
The intrinsic parameters are needed by the OpenCV methods cv::solvePnP and cv::Rodrigues and that uses the rodrigues method to get the extrinsic parameters. This method get in input 2 set of corresponding points: some 3D knowon points and their 2D projection. That's why all augmented reality applications need some markers: usually the markers are square, so after detecting it you know the 2D projection of the point P1(0,0,0) P2(0,1,0) P3(1,1,0) P4(1,0,0) that forms a square and you can find the plane lying on them.
Once you have the extrinsic parameters all the game is easily solved: you just have to make a perspective projection in OpenGL with the FoV and the aperture angle of the camera from intrinsic parameter and put the camera in the position given by the extrinsic parameters.
Of course, if you want (and you should) understand and handle each step of this process correctly.. there is a lot of math - matrices, angles, quaternion, matrices again, and.. matrices again. You can find a reference in the famous Multiple View Geometry in Computer Vision from R. Hartley and A. Zisserman.
Moreover, to handle correctly the opengl part you have to deal with the so called "Modern OpenGL" (remember that glLoadMatrix is deprecated) and a little bit of shader for loading the matrices of the camera position (for me this was a problem because I didn't knew anything about it).
I have dealt with this some times ago and I have some code so feel free to ask any kind of problems you have. Here some links I found interested:
http://ksimek.github.io/2012/08/14/decompose/ (really good explanation)
Camera position in world coordinate from cv::solvePnP (a question I asked about that)
http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/ (fabulous blog about computer vision)
http://spottrlabs.blogspot.it/2012/07/opencv-and-opengl-not-always-friends.html (nice tricks)
http://strawlab.org/2011/11/05/augmented-reality-with-OpenGL/
http://www.songho.ca/opengl/gl_projectionmatrix.html (very good explanation on opengl camera settings basics)
some other random usefull stuffs:
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html (documentation, always look at the docs!!!)
Determine extrinsic camera with opencv to opengl with world space object
Rodrigues into Eulerangles and vice versa
Python Opencv SolvePnP yields wrong translation vector
http://answers.opencv.org/question/23089/opencv-opengl-proper-camera-pose-using-solvepnp/
Please read them before anything else. As usual, once you got the concept it is an easy joke, need to crash your brain a little bit against the wall. Just don't be scared from all those math : )

How to do the correspondance 2D-3D points

I'm working with OpenCv API on an augmented reality project using one camera.I have :
The 3D point of my 3D object( i get 4 points from MeshLab)
The 2D points which i want to follow ( i have 4 points):these points are not the projection of the 3D points.
Intrinsic camera parameters.
Using these parameters, i have the extrinsic parameters( rotation and translation using the cvFindExtrinsicParam function) which i have used to render my model and set the modelView matrix.
My problem is that the 3D model are not shown in particular position: it has been shown in différent location on my image. How can i fix the model location and then the modelView matrix?
In other forums they told me that i should do the correspondance 2D-3D to get the extrinsic parameters but i don't know how to correspond my 2D points with the 3D points?
Typically you would design the points you want to track in such a fashion that the 2d-3d correspondence is immediately clear. The easiest way to do this is to have points with different colors. You could also go with some sort of pattern (google augmented reality cards) which you would then have to analyze in order to find out how it is rotated in the image. The pattern of course can not be rotation symmetric.
If you can't do that, you can try out all the different permutations of the points, plug them into OpenCV to get a matrix, then project your 3D points to 2D points with those matrices, and then see which one fits best.

Reconstructing 3D from some images without calibration?

I want to make a 3D reconstruction from multiple images without using a chessboard Calibration. I'm using OpenCV and studying the method to obtain the way to get the model 3D from 30 images without calibrating the camera with a chessboard pattern.
Is this possible? Where can I get the extrinsics params?
Can I make the 3D reconstruction without calibrating?
The calibration grid (chessboard in the typical OpenCV example) is simply an object of known dimensions that lets you estimate the camera's intrinsic parameters, i.e. the mapping from camera coordinates to the image coordinates of a point. This includes focal length, centre of projection, radial distortion parameters et cetera.
If you do away with the calibration object, you will need to find these parameters from the image observations themselves. This approach is called "self-calibration" or "auto-calibration" and can be fairly involved. Basically, you are trying to get a good starting point for the follow-up non-linear optimisation (i.e. bundle adjustment). For a start, you might want to refer to Marc Pollefeys' PhD thesis, who came up with a simple linear algorithm for this problem:
http://www.cs.unc.edu/~marc/pubs/PollefeysIJCV04.pdf

Rigid motion estimation

Now what I have is the 3D point sets as well as the projection parameters of the camera. Given two 2D point sets projected from the 3D point by using the camera and transformed camera(by rotation and translation), there should be an intuitive way to estimate the camera motion...I read some parts of Zisserman's book "Muliple view Geometry in Computer Vision", but I still did not get the solution..
Are there any hints, how can the rigid motion be estimated in this case?
THANKS!!
What you are looking for is a solution to the PnP problem. OpenCV has a function which should work called solvePnP. Just to be clear, for this to work you need point locations in world space, a camera matrix, and the points projections onto the image plane. It will then tell you the rotation and translation of the camera or points depending on how you choose to think of it.
Adding to the previous answer, Eigen has an implementation of Umeyama's method for estimation of the rigid transformation between two sets of 3d points. You can use it to get an initial estimation, and then refine it using an optimization algorithm and considering the projections of the 3d points onto the images too. For example, you could try to minimize the reprojection error between 2d points on the first image and projections of the 3d points after you bring them from the reference frame of one camera to the the reference frame of the other using the previously estimated transformation. You can do this in both ways, using the transformation and its inverse, and try to minimize the bidirectional reprojection error. I'd recommend the paper "Stereo visual odometry for autonomous ground robots", by Andrew Howard, as well as some of its references for a better explanation, especially if you are considering an outlier removal/inlier detection step before the actual motion estimation.

OpenCV translational/rotational displacement between frames?

I am currently researching the use of a low resolution camera facing vertically at the ground (fixed height) to measure the speed (speed of the camera passing over the surface). Using OpenCV 2.1 with C++.
Since the entire background will be constantly moving, translating and/or rotating between consequtive frames, what would be the most suitable method in determining the displacement of the frames in a 'useable value' form? (Function that returns frame displacement?) Then based on the height of the camera and the frame area captured (dimensions of the frame in real world), I would be able to calculate the displacement in the real world based on the frame displacement, then calculating the speed for a measured time interval.
Trying to determine my method of approach or if any example code is available, converting a frame displacement (or displacement of a set of pixels) into a distance displacement based on the height of the camera.
Thanks,
Josh.
It depends on your knowledge in computer vision. For the start, I would use what opencv can offer. please have a look at the feature2d module.
What you need is to first extract feature points (e.g. sift or surf), then use its build in matching algorithms to match points extracted from two frames. Each match will give you some constraints, and you will end up solving a over-saturated Ax=B.
Of course, do your experiments offline, i.e. shooting a video first and then operate on the single images.
UPDATE:
In case of mulit-camera calibration, your goal is to determine the 3D location of each camera, which is exactly what you have. Imagine instead of moving your single camera around, you have as many cameras as the number of images in the video captured by your single camera and you want to know the 3D location of each camera location, which represent the location of each image being taken by your single moving camera.
There is a matrix where you can map any 3D point in the world to a 2D point on your image see wiki. The camera matrix consists of 2 parts, intrinsic and extrinsic parameters. I (maybe inexactly) referred intrinsic parameter as the internal matrix. The intrinsic parameters consists of static parameters for a single camera (e.g. focal length), while the extrinsic ones consists of the location and rotation of your camera.
Now, once you have the intrinsic parameters of your camera and the matched points, you can then stack a lot of those projection equations on top of each other and solve the system for both the actual 3D location of all your matched points and all the extrinsic parameters.
Given interest points as described above, you can find the translational transformation with opevcv's findHomography.
Also, if you can assume that transformations will be somewhat small and near-linear, you can just compare image pixels of two consecutive frames to find the best match. With enough downsampling, this doesn't take too long, and from my experience works rather well.
Good luck!