What is the fastest bounding box prediction algorithm? - computer-vision

What is the fastest bounding box prediction algorithm without the classification?
For example, I have a flat surface with objects on top of it. I don't to need to know the type of the objects, I need only their bounding boxes. Something like pixel wise segmentation for two types of objects: ground and item.

I think that what you are looking for are models for "salient object detection" ("dealing with locating and segmenting the most salient object or region in a scene").
The output of such a model is a map of the same size as the input image, where each pixel's value is the probability that it is part of a salient object. Using some threshold value you can locate the boundaries of the object.
You can probably find information regarding the processing requirements of different models in an article named "Salient Object Detection: A Survey" (It was just recently updated).

Related

Fittest polygon bounding objects in an image

Is there any method to create a polygon(not a rectangle) around an object in an image for object recognition.
Please refer the following images:
the result I am looking for
and
the original image
.
I am not looking for bounding rectangles like this.I know the concepts of transfer learning, using pre-trained models for object recognition and other object detection concepts.
The main aim is the object detection but not giving results using bounding box but a fitter polygon instead.Link to some resources or papers will be helpful.
Here is a very simple (and a bit hacky) idea, but it might help: take a per-pixel scene labeling algorithm, e.g. SegNet, and then turn the resulting segmented image into a binary image, where the white pixels are the class of interest (in your example, white for cars and black for the rest). Now compute edges. You can add those edges to the original image to obtain a result similar to what you want.
What you want is called image segmentation, which is different to object detection. The best performing methods for common object classes (e.g. cars, bikes, people, dogs,...) do this using trained CNNs, and are usually called semantic segmentation networks awesome links. This will, in theory, give you regions in your image corresponding to the object you want. After that you can fit an enclosing polygon using what is called the convex hull.

Motion detection by eliminating constant movements

I am trying to implement a motion detection in OpenCV C++. I tried various methods like MOG, Optical flow which work fine but is there a way we can eliminate constant movements in the scene like a constant fan motion etc ? I have opencv accumuateWeighted() in mind but not sure if it works. Is there any better way we can do it ?
I have not got full robust solution and also i don't have any experience with video processing but i would put my idea whatever till now i have got in to this problem:
First consider a few pairs of consecutive image frames from the video and convert them to gray scale for more robust comparison.
Raster scan the image pairs and find the difference of image pairs by comparing corresponding pairs.
The resultant image will give the pixel location where there is a change in image to image in a pair, cluster these pixels locations and make a bounding box over them. So that this bounding box region will mark an object which is translating/rotation.
Now as we have applied the above image difference operation over several pairs. We will have rotating/translating bounding box in each image pair difference.
Now check in each resultant image difference with pixels having bounding box over them.
Compare bounding box central location in a difference image with other difference images. If bounding box with a very slight variation in its central location exists across all difference images then object contained in that bounding box will be having rotational motion like Fan,leaves and remaining bounding boxes will represent the actual translating objects in the video.

2D object detection with only a single training image

The vision system is given a single training image (e.g. a piece of 2D artwork ) and it is asked whether the piece of artwork is present in the newly captured photos. The newly captured photos can contain a lot of any other object and when the artwork is presented, it must face up but may be occluded.
The pose space is x,y,rotation and scale. The artwork may be highly symmetric or not.
What is the latest state of the art handling this kind of problem?
I have tried/considered the following options but there are some problems in all of them. If my argument is invalid please correct me.
deep learning(rcnn/yolo): a lot of labeled data are needed means a lot of human labor is needed for each new pieces of artwork.
traditional machine learning(SVM,Random forest): same as above
sift/surf/orb + ransac or voting: when the artwork is symmetric, the features matched are mostly incorrect. A lot of time is needed in the ransac/voting stage.
generalized hough transform: the state space is too large for the voting table. Pyramid can be applied but it is difficult to choose some universal thresholds for different kinds of artwork to proceed down the pyramid.
chamfer matching: the state space is too large. Too much time is needed in searching across the state space.
Object detection requires a lot of labeled data of the same class to generalize well, and in your setting it would be impossible to train a network with only single instance.
I assume that in your case online object trackers can work, at least give it a try. There are some convolutional object trackers that work great like Siamese CNNs. The code is open source at github, and you can watch this video to see its performance.
Online object tracking: Given the initialized state (e.g., position
and size) of a target object in a frame of a video, the goal
of tracking is to estimate the states of the target in the subsequent
frames.-source-
You can try using traditional feature based image processing algorithm which might give true positive template matches up to a descent accuracy.
Given template image as in the question:
First dilate the image to join all very closely spaced connected
components.
Find the convex hull of the connected object obtained above,This will give you a polygon.
Use above polygon edge length information like (max-length/min-length) ratio as feature of the template.
Also find the pixel density in the polygon as second feature.
We have 2 features now.
Scene image feature vector:
Similarly Again in the scene image use dilation followed by connected components identification, define convex hull(polygon) around each connected objects and define feature vector for each object(edge info, pixel density).
Now as usual search for template feature vector in the scene image feature vectors data with minimum feature distance(also use certain upper level distance threshold value to avoid false positive matches).
This should give the true positive matches if available in the scene image.
Exception: This method would not work for occluded objects.

OpenCV - PCA analysis on BW image - area vs shape - is there already an implementation?

The shape of an object is detected on a bw image. The object is a black continuous shape, the background is white.
We use PCA (http://docs.opencv.org/3.1.0/d1/dee/tutorial_introduction_to_pca.html) to get the object direction and align the object. Currently the shape itself (the points on the contour) is the input to the opencv PCA implementation. This usually works very well. But from time to time there is small dirt on the object border, causing the shape to pass around the dirt. This causes more points and more weight on one side, slightly turning the object.
Idea: Instead of the contour, we use the area of the object as input for our PCA analysis. The issue there, to check all points on if they are inside the contour and then use them for PCA slows the application down. This part will be about 52352 times slower.
New Approach: We take random points in the image, check if they are inside the shape and if so, use them for our PCA. We have to see if we can get the consistent quality needed from this approach.
Is there already a similar implementation in opencv which is using the area instead of the shape?
Another approach would be to put a mesh over the object and use the mesh points inside the object for PCA.
Is there already something similar available one can just use or does one quickly need to implement something like this?
Going for straight lines around the object isn't an option.
Given that we have received very limited information about your problem (posting images would help a lot) and you do not seem to know the probability density function of the noise, your best bet is to consider the noise to be Gaussian.
As such, and following your intuition, my suggested approach is to take a few (by a few I mean statistically relevant but not raising the computation time that much) random points that lie inside the object and compute the PCA.
Repeat this procedure in an iterative loop and store somewhere the resulting rotation angles you get from the application of the PCA to the object shape.
Stop once you have enough point, compute the mean of the rotation angles: this is a decent estimate of the true angle. Compute also the standard deviation to get a measure of the quality of your estimation. By "enough points" you can consider that ~30 points is usually considered to be "enough" for being representative of the underlying population according to the central limit theorem.
If you want, you can improve on this approach in many ways, for example doing robust estimation of the true angle once you have collected enough points. It all depends on the data you have at hand...take my suggestion just as a starting point.
There are few parameters that you could change, in which may improve your system.
First is the threshold you use to binarize your image. I don't know what your application is about, but you could use other color systems, or normalize your image by cromacity, and after that, apply the new threshold.
Other aspect is to exclude shapes (contours) that have bigger or smaller area that what you are expecting.
To add up, you may use a blur filter before detect contours.
I don't know how the noise looks but when you say "small dirt" I think it might be only some few pixels that is a lot smaller then the object it self, but it might be attached to the object. To reduce this noise it might be possible to perform an opening (morphology) on the binary image.
http://docs.opencv.org/2.4/doc/tutorials/imgproc/opening_closing_hats/opening_closing_hats.html

How compare two edges images in opencv (not matchShapes)

A little introduction on what I'm doing ...
For academic purposes I am creating an application in c++ using opencv for the detection of static objects in a scene.
The application is based on a combined approach of background subtraction and tracking, and the detection of events related to the abandonment of the objects works fine.
But at the moment I have a problem that I can't solve; I have to implement a finite state machine for detect the event of object removal, both before and after the entry of the object in the background.
To do this I was ordered by my superiors to use the edges of objects.
And now the problem.
After detecting a vehicle illegally parked along a road, I need to compare the edges of various images (the background captured at the time of the alarm, the current background, the current frame) to understand what the vehicle do (picks up the movement, remains parked or picks up the movement after being in the background).
I run these comparisons on the region of the scene in which there is the vehicle (vehicles typically have different size), I pull the edges using canny algorithm by obtaining a binarized CV_8UC1 cv::Mat.
At this point I have to compare them.
I tried to detect the contours with findContours and compare them with matchShapes, but it does not seem the right way, I'd compare each contour of the first image with every contour of the second, in addition typically the two images to campare have different number of contour (for example original background and current background, because the edges of the current background increased with the entry of the vehicle in the background).
I also tried to create a new image in which each pixel corresponds to the absolute difference of the other two, then I counted the white pixels of the difference image (wPx), and I used this number for comparison in this way: I set two thresholds (thr1 and thr2), and counted the pixels of the bounding rect of the vehicle (perim), if wPxthr2*perim images are different.
(I set percentages thresholds and I moltipy them with the perimeter of the bounding box to adapt the thresholds to the vehicle dimensions.)
This solution, however, seems to be very little robust.
Do you have something simple to suggest me?
Thank you very much in any case, more than once you StackOverflow users have helped me!
PS: THIS is an example of the images that I have to compare
The first is the background without the vehicle stationary, contains the edges of the street;
the second is the original background, the one captured when the stationary vehicle is detected;
the third is the current background (which in this case is equal to the original being the same frame, but then change);
the fourth is the current frame of the video;
You may want to take a look at this paper: A Novel SIFT-Like-Based Approach
for FIR-VS Images Registration. Aguilera et al. propose an Edge Oriented Histogram descriptor (EOH-SIFT).
This paper intends to register multispectral images, visible and infrared image, to each other. Because of the different characteristics of the images, the authors first extract edges/contours in both images, which results in images similiar to yours.
So, you can describe your image patches using this descriptor, illustrated in the following figure (taken from the above paper):
Subdivide your image patch into 4x4 zones
For each of the 16 subregions compose a histogram of contour's orientation (5 bins)
Put the histograms together into one descriptor vector of size 16x5=80 bins
Normalize the feature vector
So, every image you want to compare (in your case 4) is described by its 80-dimensional feature vector. You can compare them to each other by calculating and evaluating the Euclidean distance between them.
Note: Here a patch of size 80x80 or 100x100 (NxN) pixels is suggested. You may have to adjust the sizes to your image sizes.