I have tried to experiment with some existing semantics segmentation code against the ultrasound nerve data set.
The implementation is based on u-net architecture.
During the training process, I can capture the verification plot for each epoch. In the following figures, the left one is the raw image, the middle one is the ground truth and the right one is the predicted one (or the probability map).
As shown in the following figures, we can see that the prediction for the epoch0 is just all black, then it seems to me that it started to capture some distribution of original image, then it gets all black again.
I just want to know how to explain the training process based on these plots, why it go back to the result of the first epoch after several training epochs.
Besides, why the predicted result tends to reproduce the distribution of original image during the training process.
Are there any insight can be derived from these training observations?
I generally followed the tutorial to generate the training set. (I use the same function of create_train_data in the tutorial.
The only difference is that I add a background channel, to make the mask image with shape (1,image_row, image_col,2)
img_mask = io.imread(os.path.join(raw_data_path, image_mask_name))
img_mask = img_mask//255
img_mask_background = 1-img_mask
After loading the npy file generated from above, I normalize the raw image of training set
imgs_train = np.load(os.path.join(train_data_path,"imgs_train.npy"))
imgs_mask_train = np.load(os.path.join(train_data_path,"imgs_mask_train.npy"))
imgs_train = imgs_train.astype('float32')
mean = np.mean(imgs_train) # mean for data centering
std = np.std(imgs_train) # std for data normalization
imgs_train -= mean
imgs_train /= std
I follow this implementation to train the model. I did not change anything except this one
self.learning_rate_node = tf.train.exponential_decay(learning_rate=learning_rate,
global_step=global_step,
decay_steps=training_iters,
decay_rate=decay_rate,
staircase=True)
I change it to
global_step = global_step*self.batch_size
epoch 0
epoch 4
epoch 12
epoch 16
Related
I have a dataset of a single class (rectangular object) with a size of 130 images. My goal is to detect the object & draw a circle/dot/mark in the centre of the object.
Because the objects are rectangular, my idea is to get the dimensions of the predicted bounding box and take the circle/dot/mark as (width/2, height/2).
However, if I were to do transfer learning, would YOLO be a good choice to detect a single class of objects in a small dataset?
YOLO should be fine. However it is old now. Try YoloV4 for better results.
People have tried transfer learning from FasterRCNN to detect single objects with 300 images and it worked fine. (Link). However 130 images is a bit smaller. Try augmenting images - flipping, rotating etc if you get inferior results.
Use same augmentation for annotation as well while doing translation, rotation, flip augmentations. For example in pytorch, for segmentation, I use:
if random.random()<0.5: # Horizontal Flip
image = T.functional.hflip(image)
mask = T.functional.hflip(mask)
if random.random()<0.25: # Rotation
rotation_angle = random.randrange(-10,11)
image = T.functional.rotate(image,angle = rotation_angle)
mask = T.functional.rotate(mask ,angle = rotation_angle)
For bounding box you will have to create coordinates, x becomes width-x for horizontal flip.
Augmentations where object position is not changing: do not change annotations e.g.: gamma intensity transformation
I new in AI world and try some practice.
It looks like I need some third-party experience.
Let's say I need to get rid of image defects (actually the task more tricky).
I hope that trained NN will be able to interpolate defect area.
For these reasons I try to create simple neural network.
It has input : grayscale image with deffect(72*54) and the same image with no defect.
Hidden layer has 2*72*54 neurons.
Main piece of code
cv::Ptr<cv::ml::ANN_MLP> ann = cv::ml::ANN_MLP::create();
int inputsCount = imageSizes.width * imageSizes.height;
std::vector<int> layerSizes = { inputsCount, inputsCount * 2, inputsCount};
ann->setLayerSizes(layerSizes);
ann->setActivationFunction(cv::ml::ANN_MLP::SIGMOID_SYM);
cv::TermCriteria tc(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 50, 0.1);
ann->setTermCriteria(tc);
ann->setTrainMethod(cv::ml::ANN_MLP::BACKPROP, 0.0001);
std::cout << "Result : " << ann->train(trainData, cv::ml::ROW_SAMPLE, resData) << std::endl;
ann->predict(trainData, predicted);
My training dataset looks like
Trained on 10 items dataset NN gives bad results on this(same) inputs. I tried different params
But trained on only 2 images NN gets close output (on trained data).
I suppose that it's not inappropriate approach and solution is not so easy.
Maybe someone has some advice about parameters or neural network architecture or whole approach.
It seems that the termination criteria were fine for just two samples but were not good enough when training with a larger number of samples. Do try adjusting them, and also the learning rate.
Judging by the quality of the pixels that have been restored properly, the network architecture seems to be fine for this task. Once the network works well on 10 samples, I strongly recommend adding more training samples.
The chief problem is that you have way to little data for the given network.
Your NN is fully connected. The weights for pixel 0,0 are entirely separate from those of pixel 1,0, and pixel 0,1 has again different weights. And you have a lot of weights, with so many nodes. So while you have plenty of pixels in 10 images, you have nowhere near enough pixels for all the weights.
A Convolutional Neural Network has far less weights, as many of its weights are reused. That means that in training, these weights are trained by multiple pixels from each training image.
Not that I'd expect this to work well with just 10 images. The human expectation is derived from years of human vision, literally billions of images.
I am writing a code on raspberry pi in python to compare two images using mean squared error. The project is an personal home security thing.
My main goal is to detect a change between the images that I capture from pi camera(if something is added to the current image or something removed from the image) but right now my code is too sensitive. It is affected by change in background lighting, which I do not want.
I have two options in front of me, to either scrape my current logic and start a new one or improve my current logic to account for these noise(if I can call them that). I am searching for ways to improve my logic but I wanted some guidance on how to go about it.
My biggest fear being, am I wasting time kicking a dead horse or should I just look for some other algorithm to detect a change in image or should I use edge detection
import numpy as np
import cv2
import os
from threading import Thread
######Function Definition########################################
def mse(imageA, imageB):
# the 'Mean Squared Error' between the two images is the
# sum of the squared difference between the two images;
# NOTE: the two images must have the same dimension
err = np.sum((imageA.astype("int") - imageB.astype("int")) ** 2)
err /= int(imageA.shape[0] * imageA.shape[1])
# return the MSE, the lower the error, the more "similar"
# the two images are
return err
def compare_images(imageA, imageB):
# compute the mean squared error
m = mse(imageA, imageB)
print(m)
def capture_image():
##shell command to click photos
os.system(image_args)
##original image Path variable
original_image_path= "/home/pi/Downloads/python-compare-two-images/originalimage.png"
##original_image_args is a shell command to click photos
original_image_args="raspistill -o "+original_image_path+" -w 320 -h 240 -q 50 -t 500"
os.system(original_image_args)
##read the greyscale of the image in to the variable original_image
original_image=cv2.imread(original_image_path, 0)
##Three images
image_args="raspistill -o /home/pi/Downloads/python-compare-two-images/Test_Images/image.png -w 320 -h 240 -q 50 --nopreview -t 10 --exposure sports"
image_path="/home/pi/Downloads/python-compare-two-images/Test_Images/"
image1_name="image.png"
#created a new thread to take pictures
My_Thread=Thread(target=capture_image)
#Thread started
My_Thread.start()
flag = 0
while(True):
if(My_Thread.isAlive()==True):
flag=0
else:
flag=1
if(flag==1):
flag=0
image1 = cv2.imread((image_path+image1_name), 0)
My_Thread=Thread(target=capture_image)
My_Thread.start()
compare_images(original_image, image1)
A first improvement is to adjust a gain to compensate for the global variation of the light. Like taking the average intensity of the two images and correcting one with the ratio of the intensities.
This can fail in case of an change of the foreground, which will influence the global average. If that change in the foreground doesn't have a too large area, you can get an estimate by robust fitting of a linear model y = a.x.
A worse, but unfortunately common, scenario, is when the background illumination changes in a non-uniform way. A partial solution is to try and fit a non-uniform gain model such as one obtained by bilinear interpolation between gains estimated at the corners, or a finer subdivision of the image.
The topic of change detection is a very studied field. One of the basic options is to model each one of the pixels as a Gaussian distribution by sampling a lot of images for each pixel and calculate the mean and variance of each pixel.
For the pixels that tend to change when there is change in lighting the variance of the pixels will be bigger than the ones that don't change as much.
In order to detect movement for a certain pixel you just need to choose what is the probability you consider as an unordarinry change in the pixel value and use the Gaussain distribution you calculated to find what is the corresponding value that is considered unordarinry.
To make this solution efficient for your raspberry pi you will need to first do an "offline" calculation of the values for each pixel that will be the threshold values for which the change in the pixel value is considered movement and store them in a file and than in the "online" sage you will just compare each pixel to the calculated value.
For the "offline" stage i recommend using images that were recorder during the entire day in order to get all the variation you need per pixel. This stage of curse can be done on your computer and only the output file will be uploaded to the raspberry pi
I just installed Fast RCNN and have run the demo,
and I came to wonder if it's possible to extract features from all bounding boxes in the image (and do this for the entire dataset).
For example, if Fast RCNN detects cat, dog, and a car from an image,
I'd like to extract separate CNN features for each of cat, dog, and car.
And do this for tens of thousands of images.
The feature extraction example on Fast RCNN's Github (https://github.com/rbgirshick/caffe-fast-rcnn/tree/master/examples/feature_extraction) seems to be the replica of feature extraction using caffe for the entire image, not each bounding box.
Could anyone help me on this?
UPDATED:
Apparently, feature extraction for each bounding box is done in the following part of the code from https://github.com/rbgirshick/fast-rcnn/blob/master/lib/fast_rcnn/test.py:
# When mapping from image ROIs to feature map ROIs, there's some aliasing
# (some distinct image ROIs get mapped to the same feature ROI).
# Here, we identify duplicate feature ROIs, so we only compute features
# on the unique subset.
if cfg.DEDUP_BOXES > 0:
v = np.array([1, 1e3, 1e6, 1e9, 1e12])
hashes = np.round(blobs['rois'] * cfg.DEDUP_BOXES).dot(v)
_, index, inv_index = np.unique(hashes, return_index=True,
return_inverse=True)
blobs['rois'] = blobs['rois'][index, :]
boxes = boxes[index, :]
# reshape network inputs
net.blobs['data'].reshape(*(blobs['data'].shape))
net.blobs['rois'].reshape(*(blobs['rois'].shape))
blobs_out = net.forward(data=blobs['data'].astype(np.float32, copy=False),
rois=blobs['rois'].astype(np.float32, copy=False))
if cfg.TEST.SVM:
# use the raw scores before softmax under the assumption they
# were trained as linear SVMs
scores = net.blobs['cls_score'].data
else:
# use softmax estimated probabilities
scores = blobs_out['cls_prob']
if cfg.TEST.BBOX_REG:
# Apply bounding-box regression deltas
box_deltas = blobs_out['bbox_pred']
pred_boxes = _bbox_pred(boxes, box_deltas)
pred_boxes = _clip_boxes(pred_boxes, im.shape)
else:
# Simply repeat the boxes, once for each class
pred_boxes = np.tile(boxes, (1, scores.shape[1]))
if cfg.DEDUP_BOXES > 0:
# Map scores and predictions back to the original set of boxes
scores = scores[inv_index, :]
pred_boxes = pred_boxes[inv_index, :]
return scores, pred_boxes
I'm trying to figure out how to tweak this to save the features, as we do with Caffe for features of the entire images, which are saved to a mdb file.
UPDATE
During the process of determining the right bounding boxes, Fast-RCNN extracts CNN features from a high (~800-2000) number of image regions, called object proposals. These regions are obtained through different algorithms, typically selective search. After this computation, it uses those features to recognize the "right" proposals and find out the "right" bounding box. This is called bounding box regression.
Of course Fast-RCNN optimizes this process, but still has to extract CNN features features from many more regions than the ones related with the object of interest.
Shortly, if you were to save the variable blobs_out in the code snap you pasted, you will save the features relative to all the object proposals, including the "wrong" proposals. But you can save all that and then try to prune and retrieve only the desired ones. To save the features, just use pickle.dump().
Look at the end of the test_net function, here. The nms_dets variable seems to store the final boxes. There may be a way to take the blobs_out you stored and throw the undesired features off, but it doesn't seem so straightforward.
The simplest solution I'm able to think about is as follows.
Let's Fast-RCNN compute the final bounding boxes. Then, extract the relative image patches, with something like the following (I'm assuming Python):
img = cv2.imread('/path/to/image')
for bbox in bboxes_list:
x0, y0, x1, y1 = bbox
cut = img[y0:y1, x0:x1]
extract_cnn_features(cut)
The feature extraction is identical to the entire image case:
net = Caffe.NET('deploy.prototxt', 'caffemodel', caffe.TEST)
# preprocess input
net.blobs['data'].data[...] = net_input
net.forward()
feats = net.blobs['my_layer'].data.copy()
Of course this method is computationally expensive, since you are basically compute twice the CNN features. It depends on your requirements about speed and the size of the CNN models.
I'm using scikit-learn for machine learning.
I have 800 samples with 2048 features, therefore I want to reduce my features to get hopefully a better accuracy.
It is a multiclass problem (class 0-5), and the features consists of 1's and 0's: [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0....,0]
I'm using the ensemble method, RandomForestClassifier().
Should I just feature select the training data ?
Is it enough if I'm using this code:
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = .3 )
clf = RandomForestClassifier( n_estimators = 200,
warm_start = True,
criterion = 'gini',
max_depth = 13
)
clf.fit( X_train, y_train ).transform( X_train )
predicted = clf.predict( X_test )
expected = y_test
confusionMatrix = metrics.confusion_matrix( expected, predicted )
Cause the accuracy didn't get higher. Is everything ok in the code or am I doing something wrong?
I'll be very grateful for your help.
I'm not sure I understood your question correctly so I'll answer to what I thought I understood =)
First, reducing the dimension of your features (from 2048 to 500 e.g.) might not provide you with better results. It all depends on the capacity of your model to catch the geometry of your data. You can get much better results for example with a linear model if you reduce dimension through non-linear methods that would catch a particular geometry and 'linearize' it, instead of directly using this linear model on the raw data. But this is because your data would intrinsicaly be non-linear and the linear model is not good therefore in the original space to catch this geometry (think of a circle in 2D).
In the code you gave, you did not reduce dimension though, you splitted the data into two dataset (feature dimension is the same, 2048, only the number of samples changed). Training on a smaller dataset most of the time results in worst accuracy (data = information, when you leave some out you lose information). But splitting data allows you to test overfitting in particular, which is very impotant. But once the best parameters chosen (see cross-validation) you should learn on all the data you have!
Given your 0.7*800=560 samples, I think a depth of 13 is pretty big and you might overfit. You may want to play with this parameter first if you want to improve your accuracy!
1) Often reducing the features space does not help with accuracy, and using a regularized classifier leads to better results.
2) To do feature selection, you need two methods: one to reduce the set of features, another that does the actual supervised task (classification here).
Have you tried just using the standard classifiers? Clearly you tried the RF, but I'd also try a linear method like LinearSVC/LogisticRegression or a kernel SVC.
If you want to do feature selection, what you need to do is something like this:
feature_selector = LinearSVC(penalty='l1') #or maybe start with SelectKBest()
feature_selector.train(X_train, y_train)
X_train_reduced = feature_selector.transform(X_train)
X_test_reduced = feature_selector.transform(X_test)
classifier = RandomForestClassifier().fit(X_train_reduced, y_train)
prediction = classifier.predict(X_test_reduced)
Or you use a pipeline, as here: http://scikit-learn.org/dev/auto_examples/feature_selection/feature_selection_pipeline.html
Maybe we should add a version without the pipeline to the examples?
[cross-posted from the mailing list where this was originally asked]
Dimensionality reduction or feature selection is definitely advisable if you have more features than samples. You could look into Principal Component Analysis and other modules in sklearn.decomposition to reduce the number of features. There is also a useful section on Feature Selection in the scikit-learn documentation.
After fitting sklearn.decomposition.PCA, you could inspect the explained_variance_ratio_ to determine an advisable number of features (n_components) to reduce to (the point of PCA here is to find a reduced number of features that captures most of the variance in your original feature space). Some might like to retain features that have a cumulative explained_variance_ratio_ above 0.9, 0.95 etc, some like to drop features beyond which the explained_variance_ratio_ drops suddenly. Then refit the PCA with the n_components you like, transform your X_train and X_test, and fit your classifier as above.