Train Mask R-CNN with some classes having no masks but only bounding boxes - computer-vision

I want to train a model to detect three different types of defects. I have a training dataset where two of these classes contain segmentation masks, but one contains only bounding boxes. Can I train a shared model or do I need to separate the training dataset and train a Faster R-CNN and a Mask R-CNN?
(I only care about bounding box output for the class containing no masks in the training data.)

You can create 'weak' masks from those bounding boxes and then combine those two datasets. Something like below:
mask = np.zeros((256, 256), dtype=np.float32)
mask[y:y+h, x:x+w] = 255.
If the two datasets are small, combining them will yield better results. But if the datasets are big enough (>2000 images) then you can use FasterRCNN + MaskRCNN approach.

Related

Ok to use dataset of segmented instances to train an object detection model?

Currently training a YOLO object detection model. I have 2 versions of the same dataset:
Contains full images, bounding boxes and labels
Contains segmented instances and labels
Which version is better to use? I'm inclined to go with the 2nd, but I'm worried that the pixels around the object, but still within the bounding box, can be important.

probability map for semantic segmantion

With respect to semantic segmentation, it seems to me that there are multiple ways for the final pixel-wise labeling, such as
softmax, sigmoid, logistic regression or other classical classification methods.
However, for softmax approach, we need to ensure the output map resulting from the network architecture has multiple channels. The number of channels matches the number of classes. For instance, if we are talking two-classes problem, masks and un-masks, then we will use two channels. Is this right?
Moreover, each channel in the output map can be treated as a probability map for a given class. Is this understanding right?
Yes to both questions. The goal of the softmax function is to transform the scores into probabilities so that you can maximize the probability of the true label.

How to calculate the distance between two GMMs in OpenCV?

I am training two GMMs in OpenCV, each with 4 components. One GMM is trained using all points from the foreground of an image and another is trained using all points in the background. I want to find out how close are the two GMMs to each other in order to get an idea on how close are the background colours to the foreground colours.
Any ideas on how I can go about this problem? The popular distance measures I see (KL, Mahalanobis etc.) are for single variable normal distributions. How can I extend this to GMMs trained on RGB values of each pixel?
Because gaussian mixture model consists of a set of weighted gaussians, you can find distance between centers of nearest gaussuans of two models. But this is not absolutely correct approach, because of probabilistic nature of model. It'll be much better to look at probabilities of both models for given value.

How to find that image is more or less homogeneous w.r.t color (hue)?

UPDATE:
I have segmented the image into different regions. For each region, I need to know whether it is more or less homogeneous in terms of color.
What could be the possible strategies to do so?
previous:
I want to check the color variance (preferably hue variance) of an image to find out the images made up of homogeneous colors (i.e. the images which have only one or two color).
I understand that one strategy could be to create a hue-histogram for that and then I can found the count of each color but I have several images altogether and I cannot create a hue-histogram of 180 bins for each image because then it would be computationally expensive for whole code.
Is there any inbuilt openCV method OR other simpler method to find out whether the image consist of homogeneous color only OR several colors?
Something, which can calculate the variance of hue-image would also be fine. I could not find something like variance(image);
PS: I am writing the code in C++.
The variance can be computed without an histogram, as the average squared values minus the square of the averaged values. It takes a single pass over the image, with two accumulators. Choose a data type that will not overflow.

how can i normalize lbp histograms obtained from patches with different size

I'm working in face recognition, and I'm trying to compare histograms of different regions of the face of several test subjects, but the issue is that the region from where the histograms are calculated have different sizes.I need to normalize the histogram, and i don't have any idea about how i can do it.
is it Local Binary Pattern histogram?
like this one?
http://en.wikipedia.org/wiki/Local_binary_patterns
I think.. does LBP feature has a same size histogram between a different window size?