KLT-Tracker: How to avoid losing detected person on recalibration - computer-vision

Reading from the ground truth, I have an initial bounding box. I then calculate a foreground mask and use cv2.goodFeaturesToTrack to get points I can follow using cv2.calcOpticalFlowPyrLK. I calculate the bounding box by taking the largest possible rectangle going through the out-most points (roughly as described here: # How to efficiently find the bounding box of a collection of points? )
However, every now and then I need to recalculate the goodFeaturesToTrack to avoid the person "losing" all the points over time.
Whenever recalculating, points may land on other people, if they stand within the bounding box of the person to track. They will now be followed, too. Due to that my bounding box fails to be of any use after such a recalculation. What are some methods to avoid such a behavior?
I am looking for resources and general explanations and not specific implementations.
Ideas I had
Take the size of the previous bounding box divided by the current bounding box size into account and ignore if size changes too much.
Take the previous white-fullness of the foreground mask divided by the current white-fullness of the foreground mask into account. Do not re-calculate the bounding box if the foreground mask is too full. Other people are probably crossing the box.
Calculate a general movement vector for the bounding box from the median of all points calculated using optical flow. Alter the bounding box only within some relation to the vector to avoid rapid changes of the bounding box.
Filter the found good features to track points using some additional metric.
In general I am looking for a method that calculates new goodFeaturesToTrack based stronger on the previous goodFeaturesToTrack or the points derived from that using optical flow, I guess.

Related

Conceptual Question Regarding the Yolo Object Detection Algorithm

My understanding is that the motivation for Anchor Boxes (in the Yolo v2 algorithm) is that in the first version of Yolo (Yolo v1) it is not possible to detect multiple objects in the same grid box. I don't understand why this is the case.
Also, the original paper by the authors (Yolo v1) has the following quote:
"Each grid cell predicts B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and also how accurate it thinks the box is that it predicts."
Doesn't this indicate that a grid cell can recognize more than one object? In their paper, they take B as 2. Why not take B as some arbitrarily higher number, say 10?
Second question: how are the Anchor Box dimensions tied to the Bounding Box dimensions, for detecting a particular object? Some websites say that the Anchor Box defines a shape only, and others say that it defines a shape and a size. In either case, how is the Anchor Box tied to the Bounding Box?
Thanks,
Sandeep
You're right that YOLOv1 has multiple (B) bounding boxes, but these are not assigned to ground truths in an effective or systematic way, and therefore also don't infer bounding boxes accurate enough.
As you can read on blog posts over the internet, an Anchor/Default Box is a box in the original image which corresponds to a specific cell in a specific feature map, which is assigned with specific aspect ratio and scale.
The scale is usually dictated by the feature map (deeper feature map -> large anchor scale), and the aspect ratio vary, e.g. {1:1, 1:2, 2:1} or {1:1, 1:2, 2:1, 1:3, 3:1}.
By the scale and aspect ratio, a specific shape is dictated, and this shape, with a position dictated by the position of the current cell in the feature map, is compared to ground truth bounding boxes in the original image.
Different papers have different assignment schemes, but it's usually goes like this: (1) if the IoU of the anchor on the original image with a GT is over some threshold (e.g. 0.5), then this is a positive assignment to the anchor, (2) if it's under some threshold (e.g. 0.1), then it's a negative assignment, and (3) if there's a gap between these two thresholds - then the anchors in between are ignored (in the loss computation).
This way, an anchor is in fact like a "detector head" responsible for specific cases, which are the most similar to it shape-wise. It is therefore responsible to detect objects with shape similar to it, and it infers both confidence to each class, and bounding box parameters relative to it, i.e. how much to modify the anchor's height, width, and center (in the two axes) to receive the correct bounding box.
Because of this assignment scheme, which distributes the responsibility effectively between the different anchors, the bounding box prediction is more accurate.
Another downside to YOLOv1's scheme is the fact that it decouples bounding box and classification. On one hand, this saves computation, but on the other hand - the classification is on the level of grid cell. Therefore the B options for bounding boxes all have the same class prediction. This means, for example, that if there are multiple objects of different class with the same center (e.g. person holding a cat), then the classification of at least all but one will be wrong. Note that it is theoretically possible that other predictions of adjacent grid cells will compensate on this wrong classification, but it is not promised, in particular since by the YOLOv1's scheme, the center is the assignment criteria.

Motion detection by eliminating constant movements

I am trying to implement a motion detection in OpenCV C++. I tried various methods like MOG, Optical flow which work fine but is there a way we can eliminate constant movements in the scene like a constant fan motion etc ? I have opencv accumuateWeighted() in mind but not sure if it works. Is there any better way we can do it ?
I have not got full robust solution and also i don't have any experience with video processing but i would put my idea whatever till now i have got in to this problem:
First consider a few pairs of consecutive image frames from the video and convert them to gray scale for more robust comparison.
Raster scan the image pairs and find the difference of image pairs by comparing corresponding pairs.
The resultant image will give the pixel location where there is a change in image to image in a pair, cluster these pixels locations and make a bounding box over them. So that this bounding box region will mark an object which is translating/rotation.
Now as we have applied the above image difference operation over several pairs. We will have rotating/translating bounding box in each image pair difference.
Now check in each resultant image difference with pixels having bounding box over them.
Compare bounding box central location in a difference image with other difference images. If bounding box with a very slight variation in its central location exists across all difference images then object contained in that bounding box will be having rotational motion like Fan,leaves and remaining bounding boxes will represent the actual translating objects in the video.

How compare two edges images in opencv (not matchShapes)

A little introduction on what I'm doing ...
For academic purposes I am creating an application in c++ using opencv for the detection of static objects in a scene.
The application is based on a combined approach of background subtraction and tracking, and the detection of events related to the abandonment of the objects works fine.
But at the moment I have a problem that I can't solve; I have to implement a finite state machine for detect the event of object removal, both before and after the entry of the object in the background.
To do this I was ordered by my superiors to use the edges of objects.
And now the problem.
After detecting a vehicle illegally parked along a road, I need to compare the edges of various images (the background captured at the time of the alarm, the current background, the current frame) to understand what the vehicle do (picks up the movement, remains parked or picks up the movement after being in the background).
I run these comparisons on the region of the scene in which there is the vehicle (vehicles typically have different size), I pull the edges using canny algorithm by obtaining a binarized CV_8UC1 cv::Mat.
At this point I have to compare them.
I tried to detect the contours with findContours and compare them with matchShapes, but it does not seem the right way, I'd compare each contour of the first image with every contour of the second, in addition typically the two images to campare have different number of contour (for example original background and current background, because the edges of the current background increased with the entry of the vehicle in the background).
I also tried to create a new image in which each pixel corresponds to the absolute difference of the other two, then I counted the white pixels of the difference image (wPx), and I used this number for comparison in this way: I set two thresholds (thr1 and thr2), and counted the pixels of the bounding rect of the vehicle (perim), if wPxthr2*perim images are different.
(I set percentages thresholds and I moltipy them with the perimeter of the bounding box to adapt the thresholds to the vehicle dimensions.)
This solution, however, seems to be very little robust.
Do you have something simple to suggest me?
Thank you very much in any case, more than once you StackOverflow users have helped me!
PS: THIS is an example of the images that I have to compare
The first is the background without the vehicle stationary, contains the edges of the street;
the second is the original background, the one captured when the stationary vehicle is detected;
the third is the current background (which in this case is equal to the original being the same frame, but then change);
the fourth is the current frame of the video;
You may want to take a look at this paper: A Novel SIFT-Like-Based Approach
for FIR-VS Images Registration. Aguilera et al. propose an Edge Oriented Histogram descriptor (EOH-SIFT).
This paper intends to register multispectral images, visible and infrared image, to each other. Because of the different characteristics of the images, the authors first extract edges/contours in both images, which results in images similiar to yours.
So, you can describe your image patches using this descriptor, illustrated in the following figure (taken from the above paper):
Subdivide your image patch into 4x4 zones
For each of the 16 subregions compose a histogram of contour's orientation (5 bins)
Put the histograms together into one descriptor vector of size 16x5=80 bins
Normalize the feature vector
So, every image you want to compare (in your case 4) is described by its 80-dimensional feature vector. You can compare them to each other by calculating and evaluating the Euclidean distance between them.
Note: Here a patch of size 80x80 or 100x100 (NxN) pixels is suggested. You may have to adjust the sizes to your image sizes.

Detect ball/circle in OpenCV (C++)

I am trying to detect a ball in an filtered image.
In this image I've already removed the stuff that can't be part of the object.
Of course I tried the HoughCircle function, but I did not get the expected output.
Either it didn't find the ball or there were too many circles detected.
The problem is that the ball isn't completly round.
Screenshots:
I had the idea that it could work, if I identify single objects, calculate their center and check whether the radius is about the same in different directions.
But it would be nice if it detect the ball also if he isn't completely visible.
And with that method I can't detect semi-circles or something like that.
EDIT: These images are from a video stream (real time).
What other method could I try?
Looks like you've used difference imaging or something similar to obtain the images you have..? Instead of looking for circles, look for a more generic loop. Suggestions:
Separate all connected components.
For every connected component -
Walk around the contour and collect all contour pixels in a list
Suggestion 1: Use least squares to fit an ellipse to the contour points
Suggestion 2: Study the curvature of every contour pixel and check if it fits a circle or ellipse. This check may be done by computing a histogram of edge orientations for the contour pixels, or by checking the gradients of orienations from contour pixel to contour pixel. In the second case, for a circle or ellipse, the gradients should be almost uniform (ask me if this isn't very clear).
Apply constraints on perimeter, area, lengths of major and minor axes, etc. of the ellipse or loop. Collect these properties as features.
You can either use hard-coded heuristics/thresholds to classify a set of features as ball/non-ball, or use a machine learning algorithm. I would first keep it simple and simply use thresholds obtained after studying some images.
Hope this helps.

Using Opencv how to detect a box in image while eliminating objects printed inside box?

I am trying to develop box sorting application in qt and using opencv. I want to measure width and length of box.
As shown in image above i want to detect only outermost lines (ie. box edges), which will give me width and length of box, regardless of whatever printed inside the box.
What i tried:
First i tried using Findcontours() and selected contour with max area, but the contour of outer edge is not enclosed(broken somewhere in canny output) many times and hence not get detected as a contour.
Hough line transform gives me too many lines, i dont know how to get only four lines am interested in out of that.
I tried my algorithm as,
Convert image to gray scale.
Take one column of image, compare every pixel with next successive pixel of that column, if difference in there value is greater than some threshold(say 100) that pixel belongs to edge, so store it in array. Do this for all columns and it will give upper line of box parallel to x axis.
Follow the same procedure, but from last column and last row (ie. from bottom to top), it will give lower line parallel to x axis.
Likewise find lines parallel to y axis as well. Now i have four arrays of points, one for each side.
Now this gives me good results if box is placed in such a way that its sides are exactly parallel to X and Y axis. If box is placed even slightly oriented in some direction, it gives me diagonal lines which is obvious as shown in below image.
As shown in image below i removed first 10 and last 10 points from all four arrays of points (which are responsible for drawing diagonal lines) and drew the lines, which is not going to work when box is tilted more and also measurements will go wrong.
Now my question is,
Is there any simpler way in opencv to get only outer edges(rectangle) of box and get there dimensions, ignoring anything printed on the box and oriented in whatever direction?
I am not necessarily asking to correct/improve my algorithm, but any suggestions on that also welcome. Sorry for such a big post.
I would suggest the following steps:
1: Make a mask image by using cv::inRange() (documentation) to select the background color. Then use cv::not() to invert this mask. This will give you only the box.
2: If you're not concerned about shadow, depth effects making your measurment inaccurate you can proceed right away with trying to use cv::findContours() again. You select the biggest contour and store it's cv::rotatedRect.
3: This cv::rotatedRect will give you a rotatedRect.size that defines the width en the height of your box in pixels
Since the box is placed in a contrasting background, you should be able to use Otsu thresholding.
threshold the image (use Otsu method)
filter out any stray pixels that are outside the box region (let's hope you don't get many such pixels and can easily remove them with a median or a morphological filter)
find contours
combine all contour points and get their convex hull (idea here is to find the convex region that bounds all these contours in the box region regardless of their connectivity)
apply a polygon approximation (approxPolyDP) to this convex hull and check if you get a quadrangle
if there are no perspective distortions, you should get a rectangle, otherwise you will have to correct it
if you get a rectangle, you have its dimensions. You can also find the minimum area rectangle (minAreaRect) of the convexhull, which should directly give you a RotatedRect