How to fine-tune ResNet50 in Keras? - python-2.7

Im trying to finetune the existing models in Keras to classify my own dataset. Till now I have tried the following code (taken from Keras docs: https://keras.io/applications/) in which Inception V3 is fine-tuned on a new set of classes.
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# train the model on the new data for a few epochs
model.fit_generator(...)
# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.
# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
print(i, layer.name)
# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 172 layers and unfreeze the rest:
for layer in model.layers[:172]:
layer.trainable = False
for layer in model.layers[172:]:
layer.trainable = True
# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')
# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(...)
Can anyone plz guide me what changes should I do in the above code so as to fine-tune ResNet50 model present in Keras.
Thanks in advance.

It is difficult to make out a specific question, have you tried anything more than just copying the code without any changes?
That said, there is an abundance of problems in the code: It is a simple copy/paste from keras.io, not functional as it is, and needs some adaption before working at all (regardless of using ResNet50 or InceptionV3):
1): You need to define the input_shape when loading InceptionV3, specifically replace base_model = InceptionV3(weights='imagenet', include_top=False) with base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(299,299,3))
2): Further, you need to adapt the number of the classes in the last added layer, e.g. if you have only 2 classes to: predictions = Dense(2, activation='softmax')(x)
3): Change the loss-function when compiling your model from categorical_crossentropy to sparse_categorical_crossentropy
4): Most importantly, you need to define the fit_generator before calling model.fit_generator() and add steps_per_epoch. If you have your training images in ./data/train with every category in a different subfolder, this can be done e.g. like this:
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
"./data/train",
target_size=(299, 299),
batch_size=50,
class_mode='binary')
model.fit_generator(train_generator, steps_per_epoch=100)
This of course only does basic training, you will for example need to define save calls to hold on to the trained weights. Only if you get the code working for InceptionV3 with the changes above I suggest to proceed to work on implementing this for ResNet50: As a start you can replace InceptionV3() with ResNet50() (of course only after from keras.applications.resnet50 import ResNet50), and change the input_shape to (224,224,3) and target_size to (224,244).
The above mentioned code-changes should work on Python 3.5.3 / Keras 2.0 / Tensorflow backend.

Beyond the important points mentioned in the above answer for ResNet50 (! if your images are shaped into similar format as in the original Keras code (224,224) - not of rectangular shape) you may substitute:
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
by
x = base_model.output
x = Flatten(x)
EDIT: Please read #Yu-Yang comment bellow

I think I experienced the same issue. It appeared to be a complex problem, which has a decent thread on github(https://github.com/keras-team/keras/issues/9214). The problem is in Batch Normalization of unfreezed blocks of the net. You have two solutions:
Only change top layer(leaving the blocks as they are)
Add a patch from the github thread above.

Related

Random Forest : finding relevant features

I am trying to train a RF model in sklearn for classification. The accuracy I get for the test is quite low with a specified set of feature vector. I assume that the feature vector I chose is misleading the model. So I tried RFE, RFECV etc to find a relevant set of feature vector - didn't help to improve the accuracy. I came up with a simple feature selection process as below>
ml_feats = #initial set of feature vector
while True
feats_to_del=[]
prev_score=0
for feat_len in range(2,len(ml_feats)):
classifier = RandomForestClassifier(**init_params)
classifier.fit(X[ml_feats[:feat_len]],Y)
score = classifier.score(Xt[ml_feats[:feat_len]],Yt)
if score<prev_score:
#feature that caused the score to decrease
print ml_feats[feat_len]
feat_to_del.append(ml_feats[feat_len])
prev_score=score
if len(feats_to_del)==0:
break
#delete irrelevant features
ml_feats=list(set(ml_feats)-set(feats_to_del))
print ml_feats #print all relevant features
Does the above code help figure out right set of features?
Thanks
What you are doing is a greedy feature selection. If you want to use RandomForestClassifier to select features, you can do something like:
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel
# xtrain : training data
# ytrain : training labels
clf = RandomForestClassifier()
sfm = SelectFromModel(estimator=clf, threshold='mean') # threshold of selection is mean of feature importances by random forest classifier
sfm.fit(xtrain, ytrain)
selected_xtrain = sfm.transform(xtrain)

Tensorflow return similar images

I want to use Google's Tensorflow to return similar images to an input image.
I have installed Tensorflow from http://www.tensorflow.org (using PIP installation - pip and python 2.7) on Ubuntu14.04 on a virtual machine CPU.
I have downloaded the trained model Inception-V3 (inception-2015-12-05.tgz) from http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz that is trained on ImageNet Large Visual Recognition Challenge using the data from 2012, but I think it has both the Neural network and the classifier inside it (as the task there was to predict the category). I have also downloaded the file classify_image.py that classifies an image in 1 of the 1000 classes in the model.
So I have a random image image.jpg that I an running to test the model. when I run the command:
python /home/amit/classify_image.py --image_file=/home/amit/image.jpg
I get the below output: (Classification is done using softmax)
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 3
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 3
trench coat (score = 0.62218)
overskirt (score = 0.18911)
cloak (score = 0.07508)
velvet (score = 0.02383)
hoopskirt, crinoline (score = 0.01286)
Now, the task at hand is to find images that are similar to the input image (image.jpg) out of a database of 60,000 images (jpg format, and kept in a folder at /home/amit/images). I believe this can be done by removing the final classification layer from the inception-v3 model, and using the feature set of the input image to find cosine distance from the feature set all the 60,000 images, and we can return the images having less distance (cos 0 = 1)
Please suggest me the way forward for this problem and how do I do this using Python API.
I think I found an answer to my question:
In the file classify_image.py that classifies the image using the pre trained model (NN + classifier), I made the below mentioned changes (statements with #ADDED written next to them):
def run_inference_on_image(image):
"""Runs inference on an image.
Args:
image: Image file name.
Returns:
Nothing
"""
if not gfile.Exists(image):
tf.logging.fatal('File does not exist %s', image)
image_data = gfile.FastGFile(image, 'rb').read()
# Creates graph from saved GraphDef.
create_graph()
with tf.Session() as sess:
# Some useful tensors:
# 'softmax:0': A tensor containing the normalized prediction across
# 1000 labels.
# 'pool_3:0': A tensor containing the next-to-last layer containing 2048
# float description of the image.
# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
# encoding of the image.
# Runs the softmax tensor by feeding the image_data as input to the graph.
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
feature_tensor = sess.graph.get_tensor_by_name('pool_3:0') #ADDED
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)
feature_set = sess.run(feature_tensor,
{'DecodeJpeg/contents:0': image_data}) #ADDED
feature_set = np.squeeze(feature_set) #ADDED
print(feature_set) #ADDED
# Creates node ID --> English string lookup.
node_lookup = NodeLookup()
top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
I ran the pool_3:0 tensor by feeding in the image_data to it. Please let me know if I am doing a mistake. If this is correct, I believe we can use this tensor for further calculations.
Tensorflow now has a nice tutorial on how to get the activations before the final layer and retrain a new classification layer with different categories:
https://www.tensorflow.org/versions/master/how_tos/image_retraining/
The example code:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py
In your case, yes, you can get the activations from pool_3 the layer below the softmax layer (or the so-called bottlenecks) and send them to other operations as input:
Finally, about finding similar images, I don't think imagenet's bottleneck activations are very pertinent representation for image search. You could consider to use an autoencoder network with direct image inputs.
(source: deeplearning4j.org)
Your problem sounds similar to this visual search project

Loading files to perform Kmean using sklearn

I have 100 files that contain system call traces. Each files is presented as seen below:
setpgrp ioctl setpgrp ioctl ioctl ....
I am trying to load these files and perform kmean calculation on them to cluster them based on similarities. Based on a tutorial on the sklearn webpage I written the following:
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Normalizer
from sklearn import metrics
from sklearn.datasets import load_files
from sklearn.cluster import KMeans, MiniBatchKMeans
import numpy as np
# parse commandline arguments
op = OptionParser()
op.add_option("--lsa",
dest="n_components", type="int",
help="Preprocess documents with latent semantic analysis.")
op.add_option("--no-minibatch",
action="store_false", dest="minibatch", default=True,
help="Use ordinary k-means algorithm (in batch mode).")
op.add_option("--use-idf",
action="store_false", dest="use_idf", default=True,
help="Disable Inverse Document Frequency feature weighting.")
op.add_option("--n-features", type=int, default=10000,
help="Maximum number of features (dimensions)"
" to extract from text.")
op.add_option("--verbose",
action="store_true", dest="verbose", default=False,
help="Print progress reports inside k-means algorithm.")
print(__doc__)
op.print_help()
(opts, args) = op.parse_args()
if len(args) > 0:
op.error("this script takes no arguments.")
sys.exit(1)
print("Loading training data:")
trainingdata = load_files('C:\data\Training data')
print("%d documents" % len(trainingdata.data))
print()
print("Extracting features from the training trainingdata using a sparse vectorizer")
if opts.use_idf:
vectorizer = TfidfVectorizer(input="file",min_df=1)
X = vectorizer.fit_transform(trainingdata.data)
print("n_samples: %d, n_features: %d" % X.shape)
print()
if opts.n_components:
print("Performing dimensionality reduction using LSA")
# Vectorizer results are normalized, which makes KMeans behave as
# spherical k-means for better results. Since LSA/SVD results are
# not normalized, we have to redo the normalization.
svd = TruncatedSVD(opts.n_components)
lsa = make_pipeline(svd, Normalizer(copy=False))
X = lsa.fit_transform(X)
explained_variance = svd.explained_variance_ratio_.sum()
print("Explained variance of the SVD step: {}%".format(
int(explained_variance * 100)))
print()
However it seems that none of the files in the dataset directory get loaded into the memory when though all files are available. I get the following error when executing the program:
raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words
Can anyone tell me why the dataset is not being loaded? What am I doing wrong?
I finally managed to load the files. The approach to use Kmean in sklearn is to vectorize the training data (using tfidf or count_vectorizer), then transform your test data using the vectorization of your training data. Once that is done you can initialize the Kmean parameters, use the training data set vectors to create the kmean cluster. Finally you can cluster your test data around your training data centroid.
The following code does what is explained above.
#Read the data in a directory:
def readfile(dataDir):
data_set = []
for file in os.listdir(dataDir):
trainingfiles = os.path.join(dataDir, file)
if os.path.isfile(trainingfiles):
data = open(trainingfiles, 'r')
dataread=str.decode(data.read())
data_set.append(dataread)
return data_set
#fitting tfidf transfrom for training data
tfidf_vectorizer_trainingset = tfidf_vectorizer.fit_transform(readfile(trainingdataDir)).toarray()
#transform the test set based on the training set
tfidf_vectorizer_testset = tfidf_vectorizer.transform(readfile(testingdataDir)).toarray()
# Kmean Clustering parameters
kmean_parameters = KMeans(n_clusters=number_of_clusters, init='k-means++', max_iter=100, n_init=1)
#Cluster the training data based on the parameters
KmeanAnalysis_training = kmean_parameters.fit(tfidf_vectorizer_trainingset)
#transform the test data based on the clustering of the training data
KmeanAnalysis_test = kmean_parameters.transform(tfidf_vectorizer_testset)

Scikit-learn 0.15.2 - OneVsRestClassifier not works due to predict_proba not available

I am trying to do onevsrest classification like below:
classifier = Pipeline([('vectorizer', CountVectorizer()),('tfidf', TfidfTransformer()),('clf', OneVsRestClassifier(SVC(kernel='rbf')))])
classifier.fit(X_train, Y)
predicted = classifier.predict(X_test)
And I get the error 'predict_proba is not available when probability = false'. I saw that there was a bug reported, the one below:
https://github.com/scikit-learn/scikit-learn/issues/1946
And it was closed as fixed, so I killed scikit-learn from my Windows PC and completely re-downloaded scikit-learn to have version 0.15.2. But I still get this error. Any suggestions? Or I understood this wrong, and I still can't use SVC with OneVSRestClassifier unless I specify probability=true?
UPDATE: just to clarify, I am trying to actually achieve multi-label classification, here is data source:
df = pd.read_csv(fileIn, header = 0, encoding='utf-8-sig')
rows = random.sample(df.index, int(len(df) * 0.9))
work = df.ix[rows]
work_test = df.drop(rows)
X_train = []
y_train = []
X_test = []
y_test = []
for i in work[[i for i in list(work.columns.values) if i.startswith('Change')]].values:
X_train.append(','.join(i.T.tolist()))
X_train = np.array(X_train)
for i in work[[i for i in list(work.columns.values) if i.startswith('Corax')]].values:
y_train.append(list(i))
for i in work_test[[i for i in list(work_test.columns.values) if i.startswith('Change')]].values:
X_test.append(','.join(i.T.tolist()))
X_test = np.array(X_test)
for i in work_test[[i for i in list(work_test.columns.values) if i.startswith('Corax')]].values:
y_test.append(list(i))
lb = preprocessing.MultiLabelBinarizer()
Y = lb.fit_transform(y_train)
And after that I send it to pipeline mentioned earlier
Ok, I did some investigation in code. OneVsRestClassifier tries to call decision_function first and if it fails - it goes for predict_proba function of base classifier (svm.svc in our case).
As far as I see, my X_test is numpy.array of lists of strings. After it undergoes a sequence of transformations specified in pipeline CountVectorizer -> TfidfTransformer it becomes a sparse matrix (by design of these things). As I see currently decision_function is not available for sparse matrices, and there is even an open suggestion on github: https://github.com/scikit-learn/scikit-learn/issues/73
So, to summarize, looks like you can't make a multilabel classification using svm.svc unless you specify probability=True. If you do this you introduce some overhead to the classifier.fit process but it will work.

using weka Filter in java code

I have a problem with using weka api in java. There are 41 features(or attributes) in my training and testing dataset. I want to take only 25 attributes (eg say 1,3,5,7,8,10.....) and remove other attributes during training and testing the classifier. I have read Weka's Filter manual available at http://weka.wikispaces.com/Use+WEKA+in+your+Java+code#Filter and http://grepcode.com/file/repo1.maven.org/maven2/nz.ac.waikato.cms.weka/weka-stable/3.6.6/weka/filters/unsupervised/attribute/Remove.java but I could not understand how to use filter in my problem. Could you please help me how to write code for this situation. Your suggestions/help will be highly appreciated.
My code is like this....
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
Instances train = ...
Instances test = ...
Here I want to take only 25 attributes(i.e column values) out of 41.
Classifier cls = new J48();
cls.buildClassifier(train);
// evaluate classifier and print some statistics
Evaluation eval = new Evaluation(train);
eval.evaluateModel(cls, test);
.....
.....
Assuming you have this, as you said:
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
Instances train = ...
Instances test = ...
Then set up the array of column indices you want. I'm assuming you're doing this in a for loop or something, but I've done just put 6 indices in manually so you get the idea.
int[] indicesOfColumnsToUse = [1,3,5,7,8,10];
Then initialize and set up your removal filter (initialize it, then set the column indices, then invert your selection so that you remove the ones you don't want, then set the "input format" based on your training data)
Remove remove = new Remove();
remove.setAttributeIndices(indicesOfColumnsToUse);
remove.setInvertSelection(true);
remove.setInputFormat(train);
Then apply the removal to your training set
Instances trainingSubset = Filter.useFilter(train, remove);
And then go on as you said, except train the classifier on the subset that you just created:
Classifier cls = new J48();
cls.buildClassifier(trainingSubset);
// evaluate classifier and print some statistics
Evaluation eval = new Evaluation(train);
eval.evaluateModel(cls, test);