Gaze tracking with a webcam (openCV and C++)

Gaze tracking with a webcam (openCV and C++) - c++

I'm trying to realize a tool that returns the 2D coordinates of the point that the user is looking at. To do that I'm using openCV, c++ language and a low-cost webcam. I have the 2D coordinates of the center of the two pupils (leftPupil, rightPupil) but I don't know how to find the user's gaze.
I suppose that some information is missing but I don't know the right formula to estimate the gaze.
Is it mandatory to add a laser to get the distance of the user from the webcam? Must I analyze the geometric form of the pupil (if they are circles or elipses)? In this case, how can I detect the case in which they are eliptic or round?
Thank your for your ideas

Related

Finding regions of higher numbers in a matrix

I am working on a project to detect certain objects in an aerial image, and as part of this I am trying to utilize elevation data for the image. I am working with Digital Elevation Models (DEMs), basically a matrix of elevation values. When I am trying to detect trees, for example, I want to search for tree-shaped regions that are higher than their surrounding terrain. Here is an example of a tree in a DEM heatmap:
https://i.stack.imgur.com/pIvlv.png
I want to be able to find small regions like that that are higher than their surroundings.
I am using OpenCV and GDAL for my actual image processing. Do either of those already contain techniques for what I'm trying to accomplish? If not, can you point me in the right direction? Some ideas I've had are going through each pixel and calculating the rate of change in relation to it's surrounding pixels, which would hopefully mean that pixels with high rates change/steep slopes would signify an edge of a raised area.
Note that the elevations will change from image to image, and this needs to work with any elevation. So the ground might be around 10 meters in one image but 20 meters in another.

Supposing you can put the DEM information into a 2D Mat where each "pixel" has the elevation value, you can find local maximums by applying dilate and then substract the result from the original image.
There's a related post with code examples in: http://answers.opencv.org/question/28035/find-local-maximum-in-1d-2d-mat/

OpenCv Blob tracking of point relative to plane

Am doing an installation that tracks blobs using openCv, and projecting graphics over the blobs. Problem is my camera is off and away from the projector.
I'm thinking to get the point's position in relation to the projection's plane, I would need to calibrate by marking out the plane's corners as seen in the camera view.
My problem is how do i use that 4 points info, and then convert the tracked blob from the camera view to the projection plane, so the projected graphic lines up with the tracked blob? Not sure what i should be searching for.

After you detect the 4 corners points, you can calculate the transformation to the projector plane by using PerspectiveTransform.
Once you have this transformation, you could use warpPerspective, to go from one coordinate system to another.

Unfortunately I'm unable to help with a minimal code example at the moment, but I recommend having a look at ofxCv and it's examples. There is a camera based undistort example, but the wrapper also provides utilities for warping/unwarping perspective via warpPerspective and unwarpPerspective.
Bare in mind ofxCv has handy function to convert to/from ofImage to cv::Mat like toCv() and toOf()
ofxCv may make it easier to use the OpenCV functions Elad Joseph recommends (which sound like exactly what you need)

OpenCV triangulatePoints varying distance

I am using OpenCV's triangulatePoints function to determine 3D coordinates of a point imaged by a stereo camera.
I am experiencing that this function gives me different distance to the same point depending on angle of camera to that point.
Here is a video:
https://www.youtube.com/watch?v=FrYBhLJGiE4
In this video, we are tracking the 'X' mark. In the upper left corner info is displayed about the point that is being tracked. (Youtube dropped the quality, the video is normally much sharper. (2x1280) x 720)
In the video, left camera is the origin of 3D coordinate system and it's looking in positive Z direction. Left camera is undergoing some translation, but not nearly as much as the triangulatePoints function leads to believe. (More info is in the video description.)
Metric unit is mm, so the point is initially triangulated at ~1.94m distance from the left camera.
I am aware that insufficiently precise calibration can cause this behaviour. I have ran three independent calibrations using chessboard pattern. The resulting parameters vary too much for my taste. ( Approx +-10% for focal length estimation).
As you can see, the video is not highly distorted. Straight lines appear pretty straight everywhere. So the optimimum camera parameters must be close to the ones I am already using.
My question is, is there anything else that can cause this?
Can a convergence angle between the two stereo cameras can have this effect? Or wrong baseline length?
Of course, there is always a matter of errors in feature detection. Since I am using optical flow to track the 'X' mark, I get subpixel precision which can be mistaken by... I don't know... +-0.2 px?
I am using the Stereolabs ZED stereo camera. I am not accessing the video frames using directly OpenCV. Instead, I have to use the special SDK I acquired when purchasing the camera. It has occured to me that this SDK I am using might be doing some undistortion of its own.
So, now I wonder... If the SDK undistorts an image using incorrect distortion coefficients, can that create an image that is neither barrel-distorted nor pincushion-distorted but something different altogether?

The SDK provided with the ZED Camera performs undistortion and rectification of images. The geometry model is based on the same as openCV :
intrinsic parameters and distortion parameters for both Left and Right cameras.
extrinsic parameters for rotation/translation between Right and Left.
Through one of the tool of the ZED ( ZED Settings App), you can enter your own intrinsic matrix for Left/Right and distortion coeff, and Baseline/Convergence.
To get a precise 3D triangulation, you may need to adjust those parameters since they have a high impact on the disparity you will estimate before converting to depth.
OpenCV gives a good module to calibrate 3D cameras. It does :
-Mono calibration (calibrateCamera) for Left and Right , followed by a stereo calibration (cv::StereoCalibrate()). It will output Intrinsic parameters (focale, optical center (very important)), and extrinsic (Baseline = T[0], Convergence = R[1] if R is a 3x1 matrix). the RMS (return value of stereoCalibrate()) is a good way to see if the calibration has been done correctly.
The important thing is that you need to do this calibration on raw images, not by using images provided with the ZED SDK. Since the ZED is a standard UVC Camera, you can use opencv to get the side by side raw images (cv::videoCapture with the correct device number) and extract Left and RIght native images.
You can then enter those calibration parameters in the tool. The ZED SDK will then perform the undistortion/rectification and provide the corrected images. The new camera matrix is provided in the getParameters(). You need to take those values when you triangulate, since images are corrected as if they were taken from this "ideal" camera.
hope this helps.
/OB/

There are 3 points I can think of and probably can help you.
Probably the least important, but from your description you have separately calibrated the cameras and then the stereo system. Running an overall optimization should improve the reconstruction accuracy, as some "less accurate" parameters compensate for the other "less accurate" parameters.
If the accuracy of reconstruction is important to you, you need to have a systematic approach to reducing it. Building an uncertainty model, thanks to the mathematical model, is easy and can write a few lines of code to build that for you. Say you want to see if the 3d point is 2 meters away, at a particular angle to the camera system, and you have a specific uncertainty on the 2d projections of the 3d point, it's easy to backproject the uncertainty to the 3d space around your 3d point. By adding uncertainty to the other parameters of the system then you can see which ones are more important and need to have lower uncertainty.
This inaccuracy is inherent in the problem and the method you're using.
First if you model the uncertainty you will see the reconstructed 3d points further away from the center of cameras have a much higher uncertainty. The reason is that the angle <left-camera, 3d-point, right-camera> is narrower. I remember the MVG book had a good description of this with a figure.
Second, if you look at the implementation of triangulatePoints you see that the pseudo-inverse method is implemented using SVD to construct the 3d point. That can lead to many issues, which you probably remember from linear algebra.
Update:
But I consistently get larger distance near edges and several times
the magnitude of the uncertainty caused by the angle.
That's the result of using pseudo-inverse, a numerical method. You can replace that with a geometrical method. One easy method is to back-project the 2d-projections to get 2 rays in 3d space. Then you want to find where the intersect, which doesn't happen due to the inaccuracies. Instead you want to find the point where the 2 rays have the least distance. Without considering the uncertainty you will consistently favor a point from the set of feasible solutions. That's why with pseudo inverse you don't see any fluctuation but a gross error.
Regarding the general optimization, yes, you can run an iterative LM optimization on all the parameters. This is the method used in applications like SLAM for autonomous vehicles where accuracy is very important. You can find some papers by googling bundle adjustment slam.

3d to 2d image transformation - PointCloud to OpenCV Image - C++

I'm collecting a hand 3d image from my Kinect, and I want to generate a 2d image using only the X and Y values to do image processing using OpenCV. The size of the 3d matrix is variable and depends on the output from the Kinect and the X and Y values are not in proper scale to generate an 2d image. My 3d points and my 3d image are: http://postimg.org/image/g0hm3y06n/
I really don't know how can I generate my 2d image to perform my Image Processing.
Someone can help me or have a good example that I can use to create my image and do the proper scaling for that problem? I want as output the HAND CONTOURS.

I think you should apply Delaunay triangulation to 2D coordinates of point cloud (depth ignored), then remove too long vertices from triangles. You can estimate the length threshold by counting points per some area and evaluating square root from the value you'll get. After you got triangulation you can draw filled triangles and find contours.

I think what you are looking for is the OKPCL package.. Also, make sure you check out this PCL post about the topic.. There is also an OpenCVPCL Bridge class but apparently the website is down.
And lastly, there has been official word that the OpenCV and PCL are joining forces for a development platform that integrates GPU computing with 2d/3D perception and processing.
HTH

You could use PCLs RangeImage respectively RangeImagePlanar class with its createFromPointCloud method (I think you do have a point cloud, right? Or what do you mean by 3D image?).
Then you can create a OpenCV Mat using getImagePoint functions.

detecting motion on opencv c++ (moving camera)

I'm doing a project for the university and I'm working with OpenCV (that is really awesome).
Now my problem is:
I have a video (.avi) and I have detected all the information I want to know about the blobs that suddenly appear in the RGB range between red and yellow. After I have realized a matrix that saves all the information about the pixel values, finally I create an image in the scale of red that represents the median pixel values.
The real problem is that the video is not static and the camera moves (not too much but it moves).
Can I calculate the x and y coordinates of the camera motion so I could shift the value of the matrix?

Who cares about your English? Till we understand your problem :) What you could really do is to give a shot at KLT motion detection that is implemented in OpenCV. Here is a link to KLT also known as optical flow If you can filter down the motion vectors limited to the blobs you can certainly get hold of the object you want to track. Even better to give KLT the objects initial coordinates/area to track. Have you checked OpenCV blobs library to get hold of the blobs? Here is the link

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js