I'm currently trying to implement a face tracking by using optical flow with opencv.
To achieve this, I detect faces with the openCV face detector, I determine features to track on the detected areas by calling goodFeaturesToTrack and I operate tracking by calling calcOpticalFlowPyrLK.
It gives good results.
However, I'd like to know when the face I'm currently tracking is not visible anymore (the person leaves the room, is hidden behind an object or another person, ...) but calcOpticalFlowPyrLK tells me nothing about it.
The status parameter of the calcOpticalFlowPyrLK function rarely reports errors concerning a tracked feature (so, if the person disappear, I will still have a good amount of valid features to track).
I've tried to calculate the directional vectors for each feature to determine the move between the previous and the actual frame for each feature of the face (for example, determining that some point of the face has move to the left between the two frames) and to calculate the variance of these vectors (if vectors are mostly different, variance is high, otherwise it is not) but it did not give the expected results (good in some situation, but bad in other cases).
What could be a good condition to determine whether the optical flow tracking has to be stopped or not?
I've thought of some possible solutions like these ones:
variance of the distance for the vectors of each tracked feature (if the move is linear, distances should be nearly the same, but if something happened, distances will be different).
Comparing the shape and size of the area containing the original position of the tracked features with the area containing the current one. At the beginning we have a square containing the features of the face. But if the person leaves the room, it can lead to a deformation of the shape.
You can try a bidirectional confidenze measure of your track points.
Therefore estimate the feature positions from img0 to img1 and than the tracked positions backwards from img1 to img0. If the double tracked features near the original ( distance should be less than 1 or 0.5 pixel) than they are successfully tracked. This is a little bit more relyable than the SSD which is used by the status flag of opencv's plk. If a certain amount of features could not been tracked the event raises.
Related
The shape of an object is detected on a bw image. The object is a black continuous shape, the background is white.
We use PCA (http://docs.opencv.org/3.1.0/d1/dee/tutorial_introduction_to_pca.html) to get the object direction and align the object. Currently the shape itself (the points on the contour) is the input to the opencv PCA implementation. This usually works very well. But from time to time there is small dirt on the object border, causing the shape to pass around the dirt. This causes more points and more weight on one side, slightly turning the object.
Idea: Instead of the contour, we use the area of the object as input for our PCA analysis. The issue there, to check all points on if they are inside the contour and then use them for PCA slows the application down. This part will be about 52352 times slower.
New Approach: We take random points in the image, check if they are inside the shape and if so, use them for our PCA. We have to see if we can get the consistent quality needed from this approach.
Is there already a similar implementation in opencv which is using the area instead of the shape?
Another approach would be to put a mesh over the object and use the mesh points inside the object for PCA.
Is there already something similar available one can just use or does one quickly need to implement something like this?
Going for straight lines around the object isn't an option.
Given that we have received very limited information about your problem (posting images would help a lot) and you do not seem to know the probability density function of the noise, your best bet is to consider the noise to be Gaussian.
As such, and following your intuition, my suggested approach is to take a few (by a few I mean statistically relevant but not raising the computation time that much) random points that lie inside the object and compute the PCA.
Repeat this procedure in an iterative loop and store somewhere the resulting rotation angles you get from the application of the PCA to the object shape.
Stop once you have enough point, compute the mean of the rotation angles: this is a decent estimate of the true angle. Compute also the standard deviation to get a measure of the quality of your estimation. By "enough points" you can consider that ~30 points is usually considered to be "enough" for being representative of the underlying population according to the central limit theorem.
If you want, you can improve on this approach in many ways, for example doing robust estimation of the true angle once you have collected enough points. It all depends on the data you have at hand...take my suggestion just as a starting point.
There are few parameters that you could change, in which may improve your system.
First is the threshold you use to binarize your image. I don't know what your application is about, but you could use other color systems, or normalize your image by cromacity, and after that, apply the new threshold.
Other aspect is to exclude shapes (contours) that have bigger or smaller area that what you are expecting.
To add up, you may use a blur filter before detect contours.
I don't know how the noise looks but when you say "small dirt" I think it might be only some few pixels that is a lot smaller then the object it self, but it might be attached to the object. To reduce this noise it might be possible to perform an opening (morphology) on the binary image.
http://docs.opencv.org/2.4/doc/tutorials/imgproc/opening_closing_hats/opening_closing_hats.html
I am using opencv c++ and am a new user. I am interested in object detection problems . So far I have studies and implemented the use of sparse optical flow( Lucas Kanade method) in a video from a stationary camera.After trying k means and Background substraction , I have decided to move to a more difficult problem , that is the moving camera.
I have so far studied some documentation and found out that I could use cv::findHomography in order to find the inliers or outliers during the sequence of frames in my video and then understand from the returned values what movement is caused due to camera motion and what due to object motion. In addition , I could use SURF features to track some objects and then decide which of them are good points .
However , I was wondering how I could implement this theory. For example, should I use the first frame as ground truth and detect some features using SURF and then for the rest of the video use findHomography for each frame ? Any ideas/help is welcome !
Detecting moving objects from moving camera is a quite challenging task, and requires solid understanding of multiple view geometry, besides there is less info on this topic available (than, for example, about structure from motion), so be warned!
Anyway, homography matrix will not be a good choice for detection of moving objects (unless you are 100% sure that your background can be represented by a flat surface accurately enough). You should probably use a fundamental matrix or trifocal tensor.
Fundamental matrix is computed from point correspondences between 2 frames. It associates points on one image with lines on other image (so called epipolar lines), and this way it is independent from scene structure. After you have obtained F matrix using some robust estimation method, like RANSAC or LMEDS (RANSAC seems to be better for this kind of task), you can calculate the reprojection error for each point. Objects that move independently from scene would not be accurately described by F matrix and will have a bigger error. So, outliers of F matrix calculated from image matches over two frames can be considered moving objects. One note though - objects that move along epipolar lines would not be detected by this approach, since their parallax can be also described by some depth level.
Trifocal tensor does not have the depth/motion ambiguity with objects that move along epipolar lines, but it is harder to estimate and it is not included into OpenCV. It can be calculated from correspondences over 3 frames, and its usage can be conceptually described as triangulating a point from 2 views and then calculating reprojection error on a third view.
As for the matching - I still think that LK tracking will be better than SURF matching if you work with video sequences, since in that case you don't need to consider very distant points as matches, and tracking usually is faster then detection+matching.
I'm actually working on a contour detection for head side. As pictures are taken in front of a white wall, I decided to run a snake (active contour model algorithm) on the picture processed with a threshold.
Problem is the snake won't fit well around the nose, the mouth, and below the mouth (as you can see in these pictures below).
//load file from disk and apply threshold
IplImage* img = cvLoadImage (file.c_str (), 0);
cvThreshold(img, img, 170, 255, CV_THRESH_BINARY);
float alpha = 0.1; // Weight of continuity energy
float beta = 0.5; // Weight of curvature energy
float gamma = 0.4; // Weight of image energy
CvSize size; // Size of neighborhood of every point used to search the minimumm have to be odd
size.width = 5;
size.height = 5;
CvTermCriteria criteria;
criteria.type = CV_TERMCRIT_ITER; // terminate processing after X iteration
criteria.max_iter = 10000;
criteria.epsilon = 0.1;
// snake is an array of cpt=40 points, read from a file, set by hand
cvSnakeImage(img, snake, cpt, &alpha, &beta, &gamma, CV_VALUE, size, criteria, 0);
I tried to change the alpha/beta/gamma parameters or iterations number but I didn't find a better result than output show below. I cannot understand why the nose is cut, and face is not fit around the mouth. I have enough points i guess for the curvature, but there still be some lines composed with several (>2) points.
Input Image :
Output Snake :
blue : points set by hand
green : output snake
Any help or ideas would be very appreciated.
Thanks !
A typical snake or active contour algorithm converges during a trade-off between 3 kind of cost functions: edge strength/distance (data term), spacing and smoothness (prior terms). Immediately, you may notice a connection to your "nose-problem" - the nose has high curvature. Your snake also have troubles getting into concave regions since this certainly increases its curvature compared to a convex hull.
SOLUTIONS:
A. Since your snake performance isn't better than one of a convex hull, as one of the remedies I would proceed with a simpler convex hull algorithm and then rerun it on its inverted residuals. It will get a nose right and then concavities will turn into convexities in the residuals. Or you can use convexity defect function of openCV instead of working with convexHull.
B. Another fix can be to reduce snake curvature parameter to allow it to curve around the nose sharply. Since you have little noise and you can actually clean it up a bit I see no problem of enforcing some constraints instead of making "softer" trade-offs. Perhaps a head silhouette prior model can help here too.
Below I tried to write my own snake algorithm using various distance transforms and weights of a distance parameter. The conclusion - the parameter matters more than distance metrics and does have some effect (a left picture uses smaller parameter than the right and thus cuts the nose more). The distance from contour (red) is shown with grey, snake is green.
C. Since your background is almost solid color, invest a bit into cleaning some residual noise (use morphological operations or connected components) and just findContrours() of the clean silhouette. I implemented this last solution below: a first image has noise deleted and the second is just a contour function from openCV.
If you want to implement by yourself, I recommend the paper "Everything you always wanted to kwon about snakes (but were afraid to ask)", By Jim Ivins and John Porrill.
About the OpenCV implementation, I don't know it very much, but I would you suggest you to:
Reduce beta, so that the curvature may be stronger
Check the image energy. Maybe the last parameter of the function (scheme) is wrong. There are two possible values: _CV_SNAKE_IMAGE and _CV_SNAKE_GRAD. You set it to 0, if I'm not wrong, and I think 0 means _CV_SNAKE_IMAGE. So, the function will assume the input image is the energy image. Again, I'm not sure how OpenCV implements this function, but I think that when you use _CV_SNAKE_IMAGE the function assumes the input image is a gradient module image. In your case, it could make the snake avoid black regions (interpreted as low gradient module) and seek bright regions. So, try to use _CV_SNAKE_GRAD as your last parameter.
I hope it can help you. Good luck!
Active contours are just bad - period. It looks like max flow min cut could easily solve this image segmentation problem.
I know this was asked sometime ago but I'm that incensed with active contours in general. This page is one of the top hits on Google and I think many people will read this post in the hope that someone can do something useful with contour evolution via pdes.
The truth is that active contours require substantial human intervention and then it only works if you have unnatural edge strengths or very high contrast.
If your a PhD or postdoc with an interest - I beg you to find something else. I guarantee a hard viva with shocking results. Although there are seemingly good contour models out there, the source code is never made available - generalised gvf within a level set for example.
All (binary) segmentation problems can be decomposed into a directed graph - your future employer and examiner will thank me. I urge you not to waste time on active contours.
It's been a while since I looked into the OpenCV implementation of active contours, but if I recall it correctly, it was using a greedy algorithm for energy minimization (Williams et al?). Furthermore, there are several improvements to the external force typically the edge information that improve snake convergence, e.g. the gradient vector flow field snake (GVF). The GVF external force is modeled as a liquid diffusion process to allows the snaxels (snake elements) to flow towards the image edges in areas of higher curvature and inward concavities. When active contouring, I would recommend a coarse to fine approach, that is, typically a high level process (a human or another segmentation process) will act as a seed for the initial snaxel positions, where then, the snake-deformation process will act as a fine way to delineate the ROI boundary. In applications like medical image analysis, this kind of approach would be acceptable, and even desirable. Another good snake algorithm a kin to level sets would be the Chan-Vese active contours without edges model, definitely worth checking out, and there are several examples of it in Matlab floating around the internet.
I am at the start of developing a software using OpenCV in Microsoft Visual 2010 Express. Now what I need to know before i get into coding is the procedures i have to follow.
Overview:
I want to develop software that detects simple boxing moves such as (Left punch, right punch) and outputs the results.
Now where am struggling is what approach should i take how should i tackle this development i.e.
Capture Video Footage and be able to extract lets say every 5th frame for processing.
Do i have to extract and store this frame perhaps have a REFERENCE image to subtract the capture frame from it.
Once i capture a frame what would be the best way to process it:
* Threshold it, then
* Detect the edges, then
* Smooth the edges using some filter, then
* Draw some BOUNDING boxes....?
What is your view on this guys or am i missing something or are there better simpler ways...? Any suggestions...?
Any answer will be much appreciated
Ps...its not my homework :)
I'm not sure if analyzing only every 5th frame will be enough, because usually punches are so fast that they could be overlooked.
I assume what you actually want to find is fast forward (towards camera) movements of fists.
In case of OpenCV I would first start off with such movements of faces, since some examples are already provided on how to do that in software package.
To detect and track faces you can use CvHaarClassifierCascade, but since this won't be fast enough for runtime detection, continue tracking such found face with Lukas-Kanade. Just pick some good-to-track points inside previously found face, remember their distance from arbitrary face middle, and at each frame update it. See this guy http://www.youtube.com/watch?v=zNqCNMefyV8 - example of just some random points tracked with Lukas-Kanade. Note that unlike faces, fists may not be so easy to track since their surface is rather uniform, better check Lukas-Kanade demo in OpenCV.
Of course with each frame actual face will drift away, once in a while re-run CvHaarClassifierCascade and interpolate to it your currently held face position.
You should be able to do above for fists also, but that will require training classifier with pictures of fists (classifier trained with faces is already provided in OpenCV).
Now having fists/face tracked you may try observing what happens to the points - when someone punches they move rapidly in some direction, while on the fist that remains still they don't move to much. And so, when you calculate average movement of single points in recent frames, the higher the value, the bigger chance that there was a punch. Alternatively, if somehow you've managed to track them accurately, if distance between each of them increases, that means object is closer to camera - and so a likely punch.
Note that without at least knowing change of a size of the fist in picture, it might be hard to distinguish if a movement of hand was forward or backward, or if the user was faking it by moving fists left or right. You may have to come up with some specialized algorithm (maybe with trial and error) to detect that, like say, increase a number of screen color pixels in location that previously fist was found.
What you are looking for is the research field of action recognition e.g. www.nada.kth.se/cvap/actions/ or an possible solution is e.g the STIP ( Space-time interest points) method www.di.ens.fr/~laptev/actions/ . But finally this is a tough job if you have to deal with occlusion or different point of views.
Given a video with a fixed background containing a lot of variation in light I am trying to detect pulses of light that occur for relatively short spans of time. When the video is played it is pretty easy for a person to distinguish the light pulses but if only shown a still frame it would be impossible to distinguish a pulse from background light.
I would like to know if there is specific terminology in machine vision that I can use to search for algorithms used to solve this problem. Also if you have any references for papers or open source software that solves this problem that would be great.
Edit: More context
The video itself is of a biological process that occurs at the sub-cellular level and while the background is fixed there is also a significant amount of random signal noise at the pixel level (there doesn't appear to be significant correlation in the noise between neighboring pixels). Note that the variation I refer to in the first paragraph is true variation and not signal noise. Since I mentioned that the process is biological it's probably also worth saying that there is no movement going on; these are just pulses of light. Also, the pulses themselves occupy enough pixels so that it is easy to discern their relative sizes.
From statistics, you could look into change point detection. The essential idea being that most of the time each (x,y) point or region, if you define some granularity of regions, has an intensity I(x,y), where I(x,y) is random, but either bounded or stochastic with some assumed distribution (e.g. normal with a given mean and standard deviation), and then it is observed with an intensity that is anomalous for that distribution. Anomaly detection would also apply, but the time series nature is more appropriate.
(If you want to go more into the statistical methodologies, it would be far more appropriate to discuss this on the statistics Stack Exchange site.)
If you look into astronomical applications, you can find papers on supernova and pulsar detection.
Update 1. Just to clarify the astronomical analogies, if the pulse is repeating, then papers on pulsars or satellites may be most appropriate. If the pulse is one-time, then papers on supernova detection would be better. If the pulse is bursty, and spatially clustered, then meteor strike detection would be better. Although spatial time series analysis, especially change point or anomaly detection, is useful, it's best to have an understanding of the stochastic phenomena of interest in order to narrow down the detection methodology.
To continue the notion of applying statistics: you might consider gridding each image frame into rectangular neighborhoods. At each time t, compute the variance (or standard deviation) of the neighborhood. Presumably, the unexcited neighborhoods will exhibit some common distribution of intensity (i.e. uniform, but most likely some form of gaussian). The presence of pulse pixels will bias that distribution in some way. When comparing a neighborhood at time t and t-1, a significant change in mean intensity (or a change in the variance, etc.) would indicate an excited neighborhood.
You might also consider looking at other measures, such as skewness and kurtosis. Assuming the initial, unexcited distribution is gaussian, the "shape" parameters could also identify differences in the pixel populations.
*Note that I'm assuming a grayscale image for simplicity, but the same principles may be applied to an RGB image.
Assuming a completely static scene with no object and camera motion, then any color deviation would be due to lighting changes.
If you detect an abrupt color/intensity change at particular pixels (i.e. brighness change above a certain allowable threshold), then this should be due to the light source turning on/off.
If you are only interested in point light sources, then any change in a region larger than the maximum apparent light source should be considered as coming from something else (e.g. the sun suddenly revealed from behind clouds).