Dealing with imbalance dataset for multi-label classification - computer-vision

In my case, I’ve 33 labels per samples. The input label tensors for a corresponding image are like [0,0,1,0,1,1,1,0,0,0,0,0…...33]. And the samples for some labels are quite low and some are high. I'm looking for predict the regression values. So what will be the best approach to improve the prediction? I would like to apply data balancing technique. But so far I found the balancing technique available only for multi-class. I’m grateful to you if you share your best knowledge about regarding my problem or any other idea to improve the performance. Thanks in Advance.

When using a single.model to regress multiple values, it is usually beneficial to preprocess the predictions to be in roughly the same range.
Look for example on the way detection models predict (regress) bounding box coordinates: values are scaled and the net predicts only corrections.

Related

Difference in hand color between pretrain dataset and fine dataset?

I have a pose estimation model pretrained on a dataset in which hands are in its nartural color. I want to finetune that model on the dataset of hands of surgeons doing surgeries. Those hands are in surgical gloves so the image of the hands are a bit different than normal hands.
pretraine image
finetune image
Does this difference in hand colors affect the model performance?
If I can make images of those surgical hands more like normal hands, will I get better performance?
Well, it depends on what your pre-trained model has learned to capture from the pre-training (initial) dataset. Suppose your model had many feature maps and not enough skin color variation in your pre-training dataset (leads to overfitting issues). In that case, your model has likely "taken the path of least resistance" and exploited that to learn feature maps that rely on the color space as means of feature extraction (which might not generalize well due to color differences).
The more your pre-training dataset match/overlap with your target dataset, the better the effects of transfer learning will be. So yes, there is a very high chance that making your target dataset (surgical hands) look more similar to your pre-training dataset will positively impact your model's performance. Moreover, I would conjecture that introducing some color variation (e.g., Color Jitter augmentation) in your pre-training dataset could also help your model generalize to your target dataset.

Image segmentation: identify spots and scratches with irregular borders

A quick introduction: I do physics research which includes experimental measurements and numerical simulations.
Below is the image which is the result of our theoretical model
.
Without going into details, I just say that the intensity and color here represent a simulated physical quantity.
Experimental results are below
The measurement has more features and details but it also has a lot of "invalid" data which are represented by darker spots, scratches and marks which have irregular borders and can vary in size and shape. Nonetheless by comparing these two pictures we can visually identify "invalid" pixels on the second figure which is the problem I am trying to solve using a computer.
Simple thresholding by intensity won't work because the valid data also can vary in intensity. I was thinking about using CNN but then I realized that it would be very tedious to prepare a training dataset because there a lot of small marks/spots needs to be marked and manually marking them will take a lot of time.
Is there any other solution for this problem? Or may be there is a pretrained neural network ( maybe SVM?) which handles a similar problem?
Let's check all options one by one taking into account the following:
you have a very specific physical process
you need accurate results
(both process-wise and geometry-wise)
CNNs
It will be hard to find a "ready-to-be-used" model for your specific process. Moreover, there will be a need to take some specific actions to get an accurate geometry out of it:
https://ai.facebook.com/blog/using-a-classical-rendering-technique-to-push-state-of-the-art-for-image-segmentation/
Background subtraction
Background subtraction will require a threshold, so for your examples and conditions it has no sense. I produced two masks based on subtracted background, find the difference:
Color-based segmentation
With a properly defined threshold (let's assume we use delta_E) you can segment several areas of interest. For example, lets define three:
bright red
red
black/dark red
Let's compare:
Before:
After:
Additional area:
Before:
After:
So color-based segmentation seems to be an option, but it is better to improve input if possible. I hope it makes any sense.

Better model for classifying image quality (seperate sharp & well lit images from blurry/out of focus/grainy images)

I have a dataset of around 20K images that are human labelled. Labels are as follows:
Label = 1 if the image is sharp and well lit, and
Label = 0 for those blurry/out of focus/grainy images.
The images are of documents such as Identity cards.
I want to build a Computer Vision model that can do the classification task.
I tried using VGG-16 for transfer learning for this task but it did not give good results (precision .65 and recall = .73). My sense is that VGG-16 is not suitable for this task. It is trained on ImageNet and has very different low level features. Interestingly the model is under-fitting.
We also tried EfficientNet 7. Though the model was able to decently perform on training and validation, test performance remains bad.
Can someone suggest more suitable model to try for this task?
I think your problem with VGG and other NN is the resizing of images:
VGG expects as input 224x224 size image. I assume your dataset has much larger resolution, and thus you significantly downscale the input images before feeding them to your network.
What happens to blur/noise when you downscale an image?
Blurry and noisy images become sharper and cleaner as you decrease the resolution. Therefore, in many of your training examples, the net sees a perfectly good image while you label them as "corrupt". This is not good for training.
An interesting experiment would be to see what types of degradations your net can classify correctly and what types it fails: You report 65% precision # 73% recall. Can you look at the classified images at that point and group them by degradation type?
That is, what is precision/recall for only blurry images? what is it for noisy images? What about grainy images?
What can you do?
Do not resize images at all! if the network needs fixed size input - then crop rather than resize.
Taking advantage of the "resizing" effect, you can approach the problem using a "discriminator". Train a network that "discriminate" between an image and its downscaled version. If the image is sharp and clean - this discriminator will find it difficult to succeed. However, for blurred/noisy images the task should be rather easy.
For this task, I think using opencv is sufficient to solve the issue. In fact comparing the variance of Lablacien of the image with a threshold (cv2.Laplacian(image, cv2.CV_64F).var()) will generate a decision if an image is bluered or not.
You ca find an explanation of the method and the code in the following tutorial : detection with opencv
I think that training a classifier that takes the output of one of one of your neural network models and the variance of Laplacien as features will improve the classification results.
I also recommend experementing with ResNet and DenseNet.
I would look at the change in color between pixels, then rank the photos on the median delta between pixels... a sharp change from RGB (0,0,0) to (255,255,255) on each of the adjoining pixels would be the max possible score, the more blur you have the lower the score.
I have done this in the past trying to estimate areas of fields with success.

Comparison of HoG with CNN

I am working on the comparison of Histogram of oriented gradient (HoG) and Convolutional Neural Network (CNN) for the weed detection. I have two datasets of two different weeds.
CNN architecture is 3 layer network.
1) 1st dataset contains two classes and have 18 images.
The dataset is increased using data augmentation (rotation, adding noise, illumination changes)
Using the CNN I am getting a testing accuracy of 77% and for HoG with SVM 78%.
2) Second dataset contact leaves of two different plants.
each class contain 2500 images without data augmentation.
For this dataset, using CNN I am getting a test accuracy of 94% and for HoG with SVM 80%.
My question is Why I am getting higher accuracy for HoG using first dataset? CNN should be much better than HoG.
The only reason comes to my mind is the first data has only 18 images and less diverse as compare to the 2nd dataset. is it correct?
Yes, your intuition is right, having this small data set (just 18 images before data augmentation) can cause the worse performance. In general, for CNNs you usually need at least thousands of images. SVMs do not perform that bad because of the regularization (that you most probably use) and because of the probably much lower number of parameters the model has. There are ways how to regularize deep nets, e.g., with your first data set you might want to give dropout a try, but I would rather try to acquire a substantially larger data set.

Using class weight to balance data set lowers accuracy in RBF SVM

I have been using sklearn to learn on some data. This is a binary classifcation task and I am using a RBF kernel. My data set is quite unbalanced (80:20) and I'm using only 120 samples, with 10ish features (I've been experimenting with a few less). Since I set class_weight="auto" the accuracy I've calculated from a cross validated (10 folds) gridsearch has dropped dramatically. Why??
I will include a couple of validation accuracy heatmaps to demonstrate the difference.
NOTE: top heatmap is before classweight was changed to auto.
Accuracy is not the best metrics to use when dealing with unbalanced dataset. Let's say you have 99 positive examples and 1 negative example, and if you predict all outputs to be positive, still you will get 99% accuracy, whereas you have mis-classified the only negative example. You might have gotten high accuracy in the first case because your predictions will be on the side which has high number of samples.
When you do class weight = auto, it takes the imbalance into consideration and hence, your predictions might have moved towards center, you can cross-check it using plotting the histograms of predictions.
My suggestion is, don't use accuracy as performance metric, use something like F1 Score or AUC.