I'm actually working on a contour detection for head side. As pictures are taken in front of a white wall, I decided to run a snake (active contour model algorithm) on the picture processed with a threshold.
Problem is the snake won't fit well around the nose, the mouth, and below the mouth (as you can see in these pictures below).
//load file from disk and apply threshold
IplImage* img = cvLoadImage (file.c_str (), 0);
cvThreshold(img, img, 170, 255, CV_THRESH_BINARY);
float alpha = 0.1; // Weight of continuity energy
float beta = 0.5; // Weight of curvature energy
float gamma = 0.4; // Weight of image energy
CvSize size; // Size of neighborhood of every point used to search the minimumm have to be odd
size.width = 5;
size.height = 5;
CvTermCriteria criteria;
criteria.type = CV_TERMCRIT_ITER; // terminate processing after X iteration
criteria.max_iter = 10000;
criteria.epsilon = 0.1;
// snake is an array of cpt=40 points, read from a file, set by hand
cvSnakeImage(img, snake, cpt, &alpha, &beta, &gamma, CV_VALUE, size, criteria, 0);
I tried to change the alpha/beta/gamma parameters or iterations number but I didn't find a better result than output show below. I cannot understand why the nose is cut, and face is not fit around the mouth. I have enough points i guess for the curvature, but there still be some lines composed with several (>2) points.
Input Image :
Output Snake :
blue : points set by hand
green : output snake
Any help or ideas would be very appreciated.
Thanks !
A typical snake or active contour algorithm converges during a trade-off between 3 kind of cost functions: edge strength/distance (data term), spacing and smoothness (prior terms). Immediately, you may notice a connection to your "nose-problem" - the nose has high curvature. Your snake also have troubles getting into concave regions since this certainly increases its curvature compared to a convex hull.
SOLUTIONS:
A. Since your snake performance isn't better than one of a convex hull, as one of the remedies I would proceed with a simpler convex hull algorithm and then rerun it on its inverted residuals. It will get a nose right and then concavities will turn into convexities in the residuals. Or you can use convexity defect function of openCV instead of working with convexHull.
B. Another fix can be to reduce snake curvature parameter to allow it to curve around the nose sharply. Since you have little noise and you can actually clean it up a bit I see no problem of enforcing some constraints instead of making "softer" trade-offs. Perhaps a head silhouette prior model can help here too.
Below I tried to write my own snake algorithm using various distance transforms and weights of a distance parameter. The conclusion - the parameter matters more than distance metrics and does have some effect (a left picture uses smaller parameter than the right and thus cuts the nose more). The distance from contour (red) is shown with grey, snake is green.
C. Since your background is almost solid color, invest a bit into cleaning some residual noise (use morphological operations or connected components) and just findContrours() of the clean silhouette. I implemented this last solution below: a first image has noise deleted and the second is just a contour function from openCV.
If you want to implement by yourself, I recommend the paper "Everything you always wanted to kwon about snakes (but were afraid to ask)", By Jim Ivins and John Porrill.
About the OpenCV implementation, I don't know it very much, but I would you suggest you to:
Reduce beta, so that the curvature may be stronger
Check the image energy. Maybe the last parameter of the function (scheme) is wrong. There are two possible values: _CV_SNAKE_IMAGE and _CV_SNAKE_GRAD. You set it to 0, if I'm not wrong, and I think 0 means _CV_SNAKE_IMAGE. So, the function will assume the input image is the energy image. Again, I'm not sure how OpenCV implements this function, but I think that when you use _CV_SNAKE_IMAGE the function assumes the input image is a gradient module image. In your case, it could make the snake avoid black regions (interpreted as low gradient module) and seek bright regions. So, try to use _CV_SNAKE_GRAD as your last parameter.
I hope it can help you. Good luck!
Active contours are just bad - period. It looks like max flow min cut could easily solve this image segmentation problem.
I know this was asked sometime ago but I'm that incensed with active contours in general. This page is one of the top hits on Google and I think many people will read this post in the hope that someone can do something useful with contour evolution via pdes.
The truth is that active contours require substantial human intervention and then it only works if you have unnatural edge strengths or very high contrast.
If your a PhD or postdoc with an interest - I beg you to find something else. I guarantee a hard viva with shocking results. Although there are seemingly good contour models out there, the source code is never made available - generalised gvf within a level set for example.
All (binary) segmentation problems can be decomposed into a directed graph - your future employer and examiner will thank me. I urge you not to waste time on active contours.
It's been a while since I looked into the OpenCV implementation of active contours, but if I recall it correctly, it was using a greedy algorithm for energy minimization (Williams et al?). Furthermore, there are several improvements to the external force typically the edge information that improve snake convergence, e.g. the gradient vector flow field snake (GVF). The GVF external force is modeled as a liquid diffusion process to allows the snaxels (snake elements) to flow towards the image edges in areas of higher curvature and inward concavities. When active contouring, I would recommend a coarse to fine approach, that is, typically a high level process (a human or another segmentation process) will act as a seed for the initial snaxel positions, where then, the snake-deformation process will act as a fine way to delineate the ROI boundary. In applications like medical image analysis, this kind of approach would be acceptable, and even desirable. Another good snake algorithm a kin to level sets would be the Chan-Vese active contours without edges model, definitely worth checking out, and there are several examples of it in Matlab floating around the internet.
Related
I have a physics engine that uses AABB testing to detect object collisions and an animation system that does not use linear interpolation. Because of this, my collisions act erratically at times, especially at high speeds. Here is a glaringly obvious problem in my system...
For the sake of demonstration, assume a frame in our animation system lasts 1 second and we are given the following scenario at frame 0.
At frame 1, the collision of the objects will not bet detected, because c1 will have traveled past c2 on the next draw.
Although I'm not using it, I have a bit of a grasp on how linear interpolation works because I have used linear extrapolation in this project in a different context. I'm wondering if linear interpolation will solve the problems I'm experiencing, or if I will need other methods as well.
There is a part of me that is confused about how linear interpolation is used in the context of animation. The idea is that we can achieve smooth animation at low frame rates. In the above scenario, we cannot simply just set c1 to be centered at x=3 in frame 1. In reality, they would have collided somewhere between frame 0 and frame 1. Does linear interpolation automatically take care of this and allow for precise AABB testing? If not, what will it solve and what other methods should I look into to achieve smooth and precise collision detection and animation?
The phenomenon you are experiencing is called tunnelling, and is a problem inherent to discrete collision detection architectures. You are correct in feeling that linear interpolation may have something to do with the solution as it can allow you to, within a margin of error (usually), predict the path of an object between frames, but this is just one piece of a much larger solution. The terminology I've seen associated with these types of solutions is "Continuous Collision Detection". The topic is large and gets quite complex, and there are books that discuss it, such as Real Time Collision Detection and other online resources.
So to answer your question: no, linear interpolation on its own won't solve your problems*. Unless you're only dealing with circles or spheres.
What to Start Thinking About
The way the solutions look and behave are dependant on your design decisions and are generally large. So just to point in the direction of the solution, the fundamental idea of continuous collision detection is to figure out: How far between the early frame and the later frame does the collision happen, and in what position and rotation are the two objects at this point. Then you must calculate the configuration the objects will be in at the later frame time in response to this. Things get very interesting addressing these problems for anything other than circles in two dimensions.
I haven't implemented this but I've seen described a solution where you march the two candidates forward between the frames, advancing their position with linear interpolation and their orientation with spherical linear interpolation and checking with discrete algorithms whether they're intersecting (Gilbert-Johnson-Keerthi Algorithm). From here you continue to apply discrete algorithms to get the smallest penetration depth (Expanding Polytope Algorithm) and pass that and the remaining time between the frames, along to a solver to get how the objects look at your later frame time. This doesn't give an analytic answer but I don't have knowledge of an analytic answer for generalized 2 or 3D cases.
If you don't want to go down this path, your best weapon in the fight against complexity is assumptions: If you can assume your high velocity objects can be represented as a point things get easier, if you can assume the orientation of the objects doesn't matter (circles, spheres) things get easier, and it keeps going and going. The topic is beyond interesting and I'm still on the path of learning it, but it has provided some of the most satisfying moments in my programming period. I hope these ideas get you on that path as well.
Edit: Since you specified you're working on a billiard game.
First we'll check whether discrete or continuous is needed
Is any amount of tunnelling acceptable in this game? Not in billiards
no.
What is the speed at which we will see tunnelling? Using a 0.0285m
radius for the ball (standard American) and a 0.01s physics step, we
get 2.85m/s as the minimum speed that collisions start giving bad
response. I'm not familiar with the speed of billiard balls but that
number feels too low.
So just checking on every frame if two of the balls are intersecting is not enough, but we don't need to go completely continuous. If we use interpolation to subdivide each frame we can increase the velocity needed to create incorrect behaviour: With 2 subdivisions we get 5.7m/s, which is still low; 3 subdivisions gives us 8.55m/s, which seems reasonable; and 4 gives us 11.4m/s which feels higher than I imagine billiard balls are moving. So how do we accomplish this?
Discrete Collisions with Frame Subdivisions using Linear Interpolation
Using subdivisions is expensive so it's worth putting time into candidate detection to use it only where needed. This is another problem with a bunch of fun solutions, and unfortunately out of scope of the question.
So you have two candidate circles which will very probably collide between the current frame and the next frame. So in pseudo code the algorithm looks like:
dt = 0.01
subdivisions = 4
circle1.next_position = circle1.position + (circle1.velocity * dt)
circle2.next_position = circle2.position + (circle2.velocity * dt)
for i from 0 to subdivisions:
temp_c1.position = interpolate(circle1.position, circle1.next_position, (i + 1) / subdivisions)
temp_c2.position = interpolate(circle2.position, circle2.next_position, (i + 1) / subdivisions)
if intersecting(temp_c1, temp_c2):
intersection confirmed
no intersection
Where the interpolate signature is interpolate(start, end, alpha)
So here you have interpolation being used to "move" the circles along the path they would take between the current and the next frame. On a confirmed intersection you can get the penetration depth and pass the delta time (dt / subdivisions), the two circles, the penetration depth and the collision points along to a resolution step that determines how they should respond to the collision.
I have got a binary image/contour containing four human beings, and I want to detect/count all humans. Since there are occlusions, so I think it is best to get the head/maxima in the contour of all the humans. In that case human can be counted.
I am able to get the global maxima\topmost point (in terms of calculus language), but I want to get all the local maximas
The code for finding the topmost point is as suggested by Adrian in his blogpost i.e.:
topmost = tuple(biggest_contour[biggest_contour[:,:,1].argmin()][0])
Can anyone please suggest how to get all the local maximas, instead of just topmost location?
Here is the sample of my Image:
The definition of "local maximum" can be tricky to pin down, but if you start with a simple method you'll develop an intuition to look further. Even if there are methods available on the web to do this work for you, it's worth implementing a few basic techniques yourself before you go googling.
One simple method I've used in the path goes something like this:
Find the contours as arrays/lists/containers of (x,y) coordinates.
At each element N (a pixel) in the list, get the pixels at N - D and N + D; that is the pixels D ahead of the current pixel and D behind the current pixel
Calculate the point-to-point distance
Calculate the distance along the contour from N-D to N+D
Calculate (distanceAlongContour)/(point-to-point distance)
...
There are numerous other ways to do this, but this is quick to implement from scratch, and I think a reasonable starting point: Compare the "geodesic" distance and the Euclidean distance.
A few other possibilities:
Do a bunch of curve fits to chunks of pixels from the contour. (Lots of details to investigate here.)
Use Ramer-Puecker-Douglas to render the outlines as polygons, then choose parameters to ensure those polygons are appropriately simplified. (Second time I've mentioned R-P-D today; it's handy.) Check for vertices with angles that deviate much from 180 degrees.
Try a corner detector. Crude, but easy to implement.
Implement an edge follower that moves from one pixel to the next in the contour list, and calculate some kind of "inertia" as the pixel shifts direction. This wouldn't be useful on a pixel-by-pixel basis, but you could compare, say, pixels N-1,N,N+1 to pixels N+1,N+2,N+3. Or just calculate the angle between them.
I'm currently trying to implement a face tracking by using optical flow with opencv.
To achieve this, I detect faces with the openCV face detector, I determine features to track on the detected areas by calling goodFeaturesToTrack and I operate tracking by calling calcOpticalFlowPyrLK.
It gives good results.
However, I'd like to know when the face I'm currently tracking is not visible anymore (the person leaves the room, is hidden behind an object or another person, ...) but calcOpticalFlowPyrLK tells me nothing about it.
The status parameter of the calcOpticalFlowPyrLK function rarely reports errors concerning a tracked feature (so, if the person disappear, I will still have a good amount of valid features to track).
I've tried to calculate the directional vectors for each feature to determine the move between the previous and the actual frame for each feature of the face (for example, determining that some point of the face has move to the left between the two frames) and to calculate the variance of these vectors (if vectors are mostly different, variance is high, otherwise it is not) but it did not give the expected results (good in some situation, but bad in other cases).
What could be a good condition to determine whether the optical flow tracking has to be stopped or not?
I've thought of some possible solutions like these ones:
variance of the distance for the vectors of each tracked feature (if the move is linear, distances should be nearly the same, but if something happened, distances will be different).
Comparing the shape and size of the area containing the original position of the tracked features with the area containing the current one. At the beginning we have a square containing the features of the face. But if the person leaves the room, it can lead to a deformation of the shape.
You can try a bidirectional confidenze measure of your track points.
Therefore estimate the feature positions from img0 to img1 and than the tracked positions backwards from img1 to img0. If the double tracked features near the original ( distance should be less than 1 or 0.5 pixel) than they are successfully tracked. This is a little bit more relyable than the SSD which is used by the status flag of opencv's plk. If a certain amount of features could not been tracked the event raises.
I am currently trying to track human heads from a CCTV. I am currently using colour histogram and LBP histogram comparison to check the affinity between bounding boxes. However sometimes these are not enough.
I was reading through a paper in the following link : paper where dispersion metric is described. However I still cannot clearly get it. For example I cannot understand what pi,j is referring to in the equation. Can someone kindly & clearly explain how I can find dispersion between bounding boxes in separate frames please?
You assistance is much appreciated :)
This paper tackles the tracking problem using a background model, as most CCTV tracking methods do. The BG model produces a foreground mask, and the aforementioned p_ij relates to this mask after some morphology. Specifically, they try to separate foreground blobs into components, based on thresholds on allowed 'gaps' in FG mask holes. The end result of this procedure is a set of binary masks, one for each hypothesized object. These masks are then used for tracking using spatial and temporal consistency. In my opinion, this is an old fashioned way of processing video sequences, only relevant if you're limited in processing power and the scenes are not crowded.
To answer your question, if O is the mask related to one of the hypothesized objects, then p_ij is the binary pixel in the (i,j) location within the mask. Thus, c_x and c_y are the center of mass of the binary shape, and the dispersion is simply the average distance from the center of mass for the shape (it is larger for larger objects. This enforces scale consistency in tracking, but in a very weak manner. You can do much better if you have a calibrated camera.
PROBLEM
I have a picture that is taken from a swinging vehicle. For simplicity I have converted it into a black and white image. An example is shown below:
The image shows the high intensity returns and has a pattern in it that is found it all of the valid images is circled in red. This image can be taken from multiple angles depending on the rotation of the vehicle. Another example is here:
The intention here is to attempt to identify the picture cells in which this pattern exists.
CURRENT APPROACHES
I have tried a couple of methods so far, I am using Matlab to test but will eventually be implementing in c++. It is desirable for the algorithm to be time efficient, however, I am interested in any suggestions.
SURF (Speeded Up Robust Features) Feature Recognition
I tried the default matlab implementation of SURF to attempt to find features. Matlab SURF is able to identify features in 2 examples (not the same as above) however, it is not able to identify common ones:
I know that the points are different but the pattern is still somewhat identifiable. I have tried on multiple sets of pictures and there are almost never common points. From reading about SURF it seems like it is not robust to skewed images anyway.
Perhaps some recommendations on pre-processing here?
Template Matching
So template matching was tried but is definitely not ideal for the application because it is not robust to scale or skew change. I am open to pre-processing ideas to fix the skew. This could be quite easy, some discussion on extra information on the picture is provided further down.
For now lets investigate template matching: Say we have the following two images as the template and the current image:
The template is chosen from one of the most forward facing images. And using it on a very similar image we can match the position:
But then (and somewhat obviously) if we change the picture to a different angle it won't work. Of course we expect this because the template no-longer looks like the pattern in the image:
So we obviously need some pre-processing work here as well.
Hough Lines and RANSAC
Hough lines and RANSAC might be able to identify the lines for us but then how do we get the pattern position?
Other that I don't know about yet
I am pretty new to the image processing scene so i would love to hear about any other techniques that would suit this simple yet difficult image rec problem.
The sensor and how it will help pre-processing
The sensor is a 3d laser, it has been turned into an image for this experiment but still retains its distance information. If we plot with distance scaled from 0 - 255 we get the following image:
Where lighter is further away. This could definitely help us to align the image, some thoughts on the best way?. So far I have thought of things like calculating the normal of the cells that are not 0, we could also do some sort of gradient descent or least squares fitting such that the difference in the distance is 0, that could align the image so that it is always straight. The problem with that is that the solid white stripe is further away? Maybe we could segment that out? We are sort of building algorithms on our algorithms then so we need to be careful so this doesn't become a monster.
Any help or ideas would be great, I am happy to look into any serious answer!
I came up with the following program to segment the regions and hopefully locate the pattern of interest using template matching. I've added some comments and figure titles to explain the flow and some resulting images. Hope it helps.
im = imread('sample.png');
gr = rgb2gray(im);
bw = im2bw(gr, graythresh(gr));
bwsm = imresize(bw, .5);
dism = bwdist(bwsm);
dismnorm = dism/max(dism(:));
figure, imshow(dismnorm, []), title('distance transformed')
eq = histeq(dismnorm);
eqcl = imclose(eq, ones(5));
figure, imshow(eqcl, []), title('histogram equalized and closed')
eqclbw = eqcl < .2; % .2 worked for samples given
eqclbwcl = imclose(eqclbw, ones(5));
figure, imshow(eqclbwcl, []), title('binarized and closed')
filled = imfill(eqclbwcl, 'holes');
figure, imshow(filled, []), title('holes filled')
% -------------------------------------------------
% template
tmpl = zeros(16);
tmpl(3:4, 2:6) = 1;tmpl(11:15, 13:14) = 1;
tmpl(3:10, 7:14) = 1;
st = regionprops(tmpl, 'orientation');
tmplAngle = st.Orientation;
% -------------------------------------------------
lbl = bwlabel(filled);
stats = regionprops(lbl, 'BoundingBox', 'Area', 'Orientation');
figure, imshow(label2rgb(lbl), []), title('labeled')
% here I just take the largest contour for convenience. should consider aspect ratio and any
% other features that can be used to uniquely identify the shape
[mx, id] = max([stats.Area]);
mxbb = stats(id).BoundingBox;
% resize and rotate the template
tmplre = imresize(tmpl, [mxbb(4) mxbb(3)]);
tmplrerot = imrotate(tmplre, stats(id).Orientation-tmplAngle);
xcr = xcorr2(double(filled), double(tmplrerot));
figure, imshow(xcr, []), title('template matching')
Resized image:
Segmented:
Template matching:
Given the poor image quality (low resolution + binarization), I would prefer template matching because it is based on a simple global measure of similarity and does not attempt to do any feature extraction (there are no reliable features in your samples).
But you will need to apply template matching with rotation. One way is to precompute rotated instances of the template, perform matchings for every angle and keep the best.
It is possible to integrate depth information in the comparison (if that helps).
This is quite similar to the problem of recognising hand-sketched characters that we tackle in our lab, in the sense that the target pattern is binary, low resolution, and liable to moderate deformation.
Based on our experiences I don't think SURF is the right way to go as pointed out elsewhere this assumes a continuous 2D image not binary and will break in your case. Template matching is not good for this kind of binary image either - your pixels need to be only slightly misaligned to return a low match score, as there is no local spatial coherence in the pixel values to mitigate minor misalignments of the window.
Our approach is this scenario is to try to "convert" the binary image into a continuous or "greyscale" image. For example see below:
These conversions are made by running a 1st derivative edge detector e.g. convolve 3x3 template [0 0 0 ; 1 0 -1 ; 0 0 0] and it's transpose over image I to get dI/dx and dI/dy.
At any pixel we can get the edge orientation atan2(dI/dy,dI/dx) from these two fields. We treat this information as known at the sketched pixels (the white pixels in your problem) and unknown at the black pixels. We then use a Laplacian smoothness assumption to extrapolate values for the black pixels from the white ones. Details are in this paper:
http://personal.ee.surrey.ac.uk/Personal/J.Collomosse/pubs/Hu-CVIU-2013.pdf
If this is a major hassle you could try using a distance transform instead, convenient in Matlab using bwdist, but it won't give as accurate results.
Now we have the "continuous" image (as per right hand column of images above). The greyscale patterns encode the local structure in the image, and are much more amenable to gradient based descriptors like SURF and template matching.
My hunch would be to try template match first, but since this is affine sensitive I would go the whole way and use a HOG/Bag of Visual words approach again just as in our above paper, to match those patterns.
We have found this pipeline to give state of the art results in sketch based shape recognition, and my PhD student has successfully used in subsequent work for matching hieroglyphs, so I think it could have a good shot at working the kind of pattern you pose in your example images.
I do not think SURF is the right approach to use here. SURF is designed to work on regular 2D intensity images, but what you have here is a 3D point cloud. There is an algorithm for point cloud registration called Iterative Closed Point (ICP). There are several implementations on MATLAB File Exchange, such as this one.
Edit
The Computer Vision System Toolbox now (as of the R2015b release) includes point cloud processing functionality. See this example for point cloud registration and stitching.
I would:
segment image
by Z coordinates (distance from camera/LASER) where Z coordinate jumps more then threshold there is border between object and background (if neighboring Z value is big or out of range) or another object (if neighboring Z value is different) or itself (if neighboring Z value is different but can be connected to itself). This will give you set of objects
align to viewer
compute boundary points of each object (most outer edges), compute direction via atan2 rotate back to face camera perpendicular.
Your image looks like flag marker so in that case rotation around Y axis should suffice. Also you can scale size of the object to predefined distance (if the target is always the same size)
You will need to know the FOV of your camera system and have calibrated Z axis for this.
now try to identify object
here use what you have by now and also can add filter like skip objects with not matching size or aspect ratio ... you can use DFT/DCT or compare histograms of normalized/equalized image etc. ...
[PS]
for features is not a good idea to use BW-Bit image because you loose too much info. Use gray-scale or color instead (gray-scale is usually enough). I usually add few simplified histograms of small area (with few different radius-es) around point of interest which is invariant on rotation.
Have a look a log-polar template matching, it is rotation and scale invariant:
http://etd.lsu.edu/docs/available/etd-07072005-113808/unrestricted/Thunuguntla_thesis.pdf