Mask RCNN for one object per image - computer-vision

I have been trying to implement mask RCNN for object localization and detection. I have 100+ different categories of same type of object which I want to classify (say if it is all trees and the categories are oak, banyan, chestnut, pine etc). Also, there is only one object per image (say, one tree per image). Is Mask RCNN suitable for this scenario ? When I try to implement this, there are multiple bounding boxes around the image and none of them even remotely cover the object.

Related

Ok to use dataset of segmented instances to train an object detection model?

Currently training a YOLO object detection model. I have 2 versions of the same dataset:
Contains full images, bounding boxes and labels
Contains segmented instances and labels
Which version is better to use? I'm inclined to go with the 2nd, but I'm worried that the pixels around the object, but still within the bounding box, can be important.

What is the fastest bounding box prediction algorithm?

What is the fastest bounding box prediction algorithm without the classification?
For example, I have a flat surface with objects on top of it. I don't to need to know the type of the objects, I need only their bounding boxes. Something like pixel wise segmentation for two types of objects: ground and item.
I think that what you are looking for are models for "salient object detection" ("dealing with locating and segmenting the most salient object or region in a scene").
The output of such a model is a map of the same size as the input image, where each pixel's value is the probability that it is part of a salient object. Using some threshold value you can locate the boundaries of the object.
You can probably find information regarding the processing requirements of different models in an article named "Salient Object Detection: A Survey" (It was just recently updated).

Fittest polygon bounding objects in an image

Is there any method to create a polygon(not a rectangle) around an object in an image for object recognition.
Please refer the following images:
the result I am looking for
and
the original image
.
I am not looking for bounding rectangles like this.I know the concepts of transfer learning, using pre-trained models for object recognition and other object detection concepts.
The main aim is the object detection but not giving results using bounding box but a fitter polygon instead.Link to some resources or papers will be helpful.
Here is a very simple (and a bit hacky) idea, but it might help: take a per-pixel scene labeling algorithm, e.g. SegNet, and then turn the resulting segmented image into a binary image, where the white pixels are the class of interest (in your example, white for cars and black for the rest). Now compute edges. You can add those edges to the original image to obtain a result similar to what you want.
What you want is called image segmentation, which is different to object detection. The best performing methods for common object classes (e.g. cars, bikes, people, dogs,...) do this using trained CNNs, and are usually called semantic segmentation networks awesome links. This will, in theory, give you regions in your image corresponding to the object you want. After that you can fit an enclosing polygon using what is called the convex hull.

2D object detection with only a single training image

The vision system is given a single training image (e.g. a piece of 2D artwork ) and it is asked whether the piece of artwork is present in the newly captured photos. The newly captured photos can contain a lot of any other object and when the artwork is presented, it must face up but may be occluded.
The pose space is x,y,rotation and scale. The artwork may be highly symmetric or not.
What is the latest state of the art handling this kind of problem?
I have tried/considered the following options but there are some problems in all of them. If my argument is invalid please correct me.
deep learning(rcnn/yolo): a lot of labeled data are needed means a lot of human labor is needed for each new pieces of artwork.
traditional machine learning(SVM,Random forest): same as above
sift/surf/orb + ransac or voting: when the artwork is symmetric, the features matched are mostly incorrect. A lot of time is needed in the ransac/voting stage.
generalized hough transform: the state space is too large for the voting table. Pyramid can be applied but it is difficult to choose some universal thresholds for different kinds of artwork to proceed down the pyramid.
chamfer matching: the state space is too large. Too much time is needed in searching across the state space.
Object detection requires a lot of labeled data of the same class to generalize well, and in your setting it would be impossible to train a network with only single instance.
I assume that in your case online object trackers can work, at least give it a try. There are some convolutional object trackers that work great like Siamese CNNs. The code is open source at github, and you can watch this video to see its performance.
Online object tracking: Given the initialized state (e.g., position
and size) of a target object in a frame of a video, the goal
of tracking is to estimate the states of the target in the subsequent
frames.-source-
You can try using traditional feature based image processing algorithm which might give true positive template matches up to a descent accuracy.
Given template image as in the question:
First dilate the image to join all very closely spaced connected
components.
Find the convex hull of the connected object obtained above,This will give you a polygon.
Use above polygon edge length information like (max-length/min-length) ratio as feature of the template.
Also find the pixel density in the polygon as second feature.
We have 2 features now.
Scene image feature vector:
Similarly Again in the scene image use dilation followed by connected components identification, define convex hull(polygon) around each connected objects and define feature vector for each object(edge info, pixel density).
Now as usual search for template feature vector in the scene image feature vectors data with minimum feature distance(also use certain upper level distance threshold value to avoid false positive matches).
This should give the true positive matches if available in the scene image.
Exception: This method would not work for occluded objects.

Identifying blobs in image as that of a vehicle

Any idea how I can get the smaller blobs belonging to the same vehicle count as 1 vehicle? Due to background subtraction, in the foreground mask, some of the blobs belonging to a vehicle are quite small, and hence filtering the blobs based on their size won't work.
Try filtering things based on colorDistance() and the comparing the mean color of the blobs in the image with the vehicle against a control image of the background without the car in it. The SimpleCV docs have a tutorial specifically on this topic. That said... it may not always work as expected. Another possibility (just occurred to me) might be summing up the area of the blobs of interest and seeing if that sum is over a given thresh-hold, rather than just any one blob itself.