Tensorflow return similar images - python-2.7

I want to use Google's Tensorflow to return similar images to an input image.
I have installed Tensorflow from http://www.tensorflow.org (using PIP installation - pip and python 2.7) on Ubuntu14.04 on a virtual machine CPU.
I have downloaded the trained model Inception-V3 (inception-2015-12-05.tgz) from http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz that is trained on ImageNet Large Visual Recognition Challenge using the data from 2012, but I think it has both the Neural network and the classifier inside it (as the task there was to predict the category). I have also downloaded the file classify_image.py that classifies an image in 1 of the 1000 classes in the model.
So I have a random image image.jpg that I an running to test the model. when I run the command:
python /home/amit/classify_image.py --image_file=/home/amit/image.jpg
I get the below output: (Classification is done using softmax)
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 3
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 3
trench coat (score = 0.62218)
overskirt (score = 0.18911)
cloak (score = 0.07508)
velvet (score = 0.02383)
hoopskirt, crinoline (score = 0.01286)
Now, the task at hand is to find images that are similar to the input image (image.jpg) out of a database of 60,000 images (jpg format, and kept in a folder at /home/amit/images). I believe this can be done by removing the final classification layer from the inception-v3 model, and using the feature set of the input image to find cosine distance from the feature set all the 60,000 images, and we can return the images having less distance (cos 0 = 1)
Please suggest me the way forward for this problem and how do I do this using Python API.

I think I found an answer to my question:
In the file classify_image.py that classifies the image using the pre trained model (NN + classifier), I made the below mentioned changes (statements with #ADDED written next to them):
def run_inference_on_image(image):
"""Runs inference on an image.
Args:
image: Image file name.
Returns:
Nothing
"""
if not gfile.Exists(image):
tf.logging.fatal('File does not exist %s', image)
image_data = gfile.FastGFile(image, 'rb').read()
# Creates graph from saved GraphDef.
create_graph()
with tf.Session() as sess:
# Some useful tensors:
# 'softmax:0': A tensor containing the normalized prediction across
# 1000 labels.
# 'pool_3:0': A tensor containing the next-to-last layer containing 2048
# float description of the image.
# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
# encoding of the image.
# Runs the softmax tensor by feeding the image_data as input to the graph.
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
feature_tensor = sess.graph.get_tensor_by_name('pool_3:0') #ADDED
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)
feature_set = sess.run(feature_tensor,
{'DecodeJpeg/contents:0': image_data}) #ADDED
feature_set = np.squeeze(feature_set) #ADDED
print(feature_set) #ADDED
# Creates node ID --> English string lookup.
node_lookup = NodeLookup()
top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
I ran the pool_3:0 tensor by feeding in the image_data to it. Please let me know if I am doing a mistake. If this is correct, I believe we can use this tensor for further calculations.

Tensorflow now has a nice tutorial on how to get the activations before the final layer and retrain a new classification layer with different categories:
https://www.tensorflow.org/versions/master/how_tos/image_retraining/
The example code:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py
In your case, yes, you can get the activations from pool_3 the layer below the softmax layer (or the so-called bottlenecks) and send them to other operations as input:
Finally, about finding similar images, I don't think imagenet's bottleneck activations are very pertinent representation for image search. You could consider to use an autoencoder network with direct image inputs.
(source: deeplearning4j.org)

Your problem sounds similar to this visual search project

Related

RAY - RLLIB - Failing to train DQN using offline sample batch - episode_len_mean: .nan value

RAY - RLLIB library - estimate a DQN model using offline batch data. Model fails to learn. episode_len_mean: .nan For CartPole example as well as personal-domain-specific dataset
Ubuntu
Ray library - RLIB
DQN
Offline
environment:- tried with Cartpole-v0 as well as with custom environment example.
episode_len_mean: .nan
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
episodes_this_iter: 0
episodes_total: 0
Generate data using PG
rllib train --run=PG --env=CartPole-v0 --config='{"output": "/tmp/cartpole-out", "output_max_file_size": 5000000}' --stop='{"timesteps_total": 100000}'
Train model on offline data
rllib train --run=DQN --env=CartPole-v0 --config='{"input": "/tmp/cartpole-out","input_evaluation": ["is", "wis"],"soft_q": true, "softmax_temp": 1.0}'
Expected :-
episode_len_mean: numerical values
episode_reward_max: numerical values
episode_reward_mean: numerical values
episode_reward_min: numerical values
Actual Results (No improvement observed in tensorboard as well) :-
episode_len_mean: .nan
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
I had more or less the same problem and it was linked to the fact that the episode was never finishing because I didn't properly set the "done" value in the step function. Until an episode is "done", Ray doesn't calculate the metrics. In my case I had to specify a counter in the environment init function called self.count_steps and incremented on each step.
def step(self, action):
# Changes the yaw angles
self.cur_yaws = action
self.farm.calculate_wake(yaw_angles=action)
# power output
power = self.farm.get_farm_power()
reward = (power-self.best_power)/self.best_power
#if power > self.best_power:
# self.best_power = power
self.count_steps +=1
if self.count_steps > 50:
done = True
else:
done = False
return self.cur_yaws, reward, done, {}

Force Google Vision API to detect dates / numbers

I'm trying to detect handwritten dates using the Google Vision API. Do you know if it is possible to force it to detect dates (DD/MM/YYYY), or at least numbers only to increase reliablity?
The function I use, takes an Image as np.array as input:
def detect_handwritten_text(img):
"""Recognizes characters using the Google Cloud Vision API.
Args:
img(np.array) = The Image on which to apply the OCR.
Returns:
The recognized content of img as string.
"""
from google.cloud import vision_v1p3beta1 as vision
client = vision.ImageAnnotatorClient()
# Transform np.array image format into vision api readable byte format
sucess, encoded_image = cv.imencode('.png', img)
content = encoded_image.tobytes()
# Configure client to detect handwriting and load picture
image = vision.types.Image(content=content)
image_context = vision.types.ImageContext(language_hints=['en-t-i0-handwrit'])
response = client.document_text_detection(image=image, image_context=image_context)
return response.full_text_annotation.text
After ImageAnnotatorClient.DetectDocumentText(your image), you could iterate over Blocks and words inside each block, and try to match a regular expression over each word to find dates and numbers.

How to fine-tune ResNet50 in Keras?

Im trying to finetune the existing models in Keras to classify my own dataset. Till now I have tried the following code (taken from Keras docs: https://keras.io/applications/) in which Inception V3 is fine-tuned on a new set of classes.
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# train the model on the new data for a few epochs
model.fit_generator(...)
# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.
# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
print(i, layer.name)
# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 172 layers and unfreeze the rest:
for layer in model.layers[:172]:
layer.trainable = False
for layer in model.layers[172:]:
layer.trainable = True
# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')
# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(...)
Can anyone plz guide me what changes should I do in the above code so as to fine-tune ResNet50 model present in Keras.
Thanks in advance.
It is difficult to make out a specific question, have you tried anything more than just copying the code without any changes?
That said, there is an abundance of problems in the code: It is a simple copy/paste from keras.io, not functional as it is, and needs some adaption before working at all (regardless of using ResNet50 or InceptionV3):
1): You need to define the input_shape when loading InceptionV3, specifically replace base_model = InceptionV3(weights='imagenet', include_top=False) with base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(299,299,3))
2): Further, you need to adapt the number of the classes in the last added layer, e.g. if you have only 2 classes to: predictions = Dense(2, activation='softmax')(x)
3): Change the loss-function when compiling your model from categorical_crossentropy to sparse_categorical_crossentropy
4): Most importantly, you need to define the fit_generator before calling model.fit_generator() and add steps_per_epoch. If you have your training images in ./data/train with every category in a different subfolder, this can be done e.g. like this:
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
"./data/train",
target_size=(299, 299),
batch_size=50,
class_mode='binary')
model.fit_generator(train_generator, steps_per_epoch=100)
This of course only does basic training, you will for example need to define save calls to hold on to the trained weights. Only if you get the code working for InceptionV3 with the changes above I suggest to proceed to work on implementing this for ResNet50: As a start you can replace InceptionV3() with ResNet50() (of course only after from keras.applications.resnet50 import ResNet50), and change the input_shape to (224,224,3) and target_size to (224,244).
The above mentioned code-changes should work on Python 3.5.3 / Keras 2.0 / Tensorflow backend.
Beyond the important points mentioned in the above answer for ResNet50 (! if your images are shaped into similar format as in the original Keras code (224,224) - not of rectangular shape) you may substitute:
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
by
x = base_model.output
x = Flatten(x)
EDIT: Please read #Yu-Yang comment bellow
I think I experienced the same issue. It appeared to be a complex problem, which has a decent thread on github(https://github.com/keras-team/keras/issues/9214). The problem is in Batch Normalization of unfreezed blocks of the net. You have two solutions:
Only change top layer(leaving the blocks as they are)
Add a patch from the github thread above.

Doing OCR to identify text written on trucks/cars or other vehicles

I am new to the world of Computer Vision.
I am trying to use Tesseract to detect numbers written on the side of trucks.
So for this example, I would like to see CMA CGM as the output.
I fed this image to Tesseract via command line
tesseract image.JPG out -psm 6
but it yielded a blank file.
Then I read the documentation of Tesserocr (python wrapper of Tesseract) and tried the following code
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
and again it was not able to read any characters in the image.
My question is how should I go about solving this problem? ( I am not looking for a ready made code, but approach on how to go about solving this problem).
Would I need to train tesseract with sample images or can I just write code using existing libraries to somehow detect the co-ordinates of the truck and try to do OCR only within the boundaries of the truck?
Tesseract expects document-only images, but you have non-document objects in your image. You need a sophisticated segmentation(then probably some image processing) process before feeding it to Tesseract-OCR.
I have a three-step solution
Take the part of the image you want to recognize
Apply Gaussian-blur
Apply simple-thresholding
You can use a range to get the part of the image.
For instance, if you select the
height range as: from (int(h/4) + 40 to int(h/2)-20)
width range as: from int(w/2) to int((w*3)/4)
Result
Take Part
Gaussian
Threshold
Pytesseract
CMA CGM
Code:
import cv2
import pytesseract
img = cv2.imread('YizU3.jpg')
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = gry[int(h/4) + 40:int(h/2)-20, int(w/2):int((w*3)/4)]
blr = cv2.GaussianBlur(gry, (3, 3), 0)
thr = cv2.threshold(gry, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

AttributError : 'NeuralNetwork' object has no attribute 'initialize'?

I am learning python programming and machine learning for my academic project and i found interest in number plate recognition.
By executing below code, i am getting error, which is mentioned below after the code
values=
['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','J','K','L','M','N','P','R','S','T','U','V','W','X','Z']
keys=range(32)
data_map=dict((keys, values))
def get_ann(data_map):
feature_mat=[]
label_mat=[]
for keys in data_map:
path_train="/home/sagar/Project data set/ANPR/ann/%s"%data_map[keys]
filenames=get_imlist(path_train)
perfeature_mat=[]
perlabel_mat=[]
for image in filenames[0]:
raw_image=cv2.imread(image)
raw_image=cv2.cvtColor(raw_image, cv2.COLOR_BGR2GRAY)
#resize the image into 5 cols(width) and 10 rows(height)
raw_image=cv2.resize(raw_image,(5,10), interpolation=cv2.INTER_AREA)
#Do a hard thresholding.
_,th2=cv2.threshold(raw_image, 70, 255, cv2.THRESH_BINARY)
#generate features
horz_hist=np.sum(th2==255, axis=0)
vert_hist=np.sum(th2==255, axis=1)
sample=th2.flatten()
#concatenate these features together
feature=np.concatenate([horz_hist, vert_hist, sample])
# append these features together along with their respective labels
perfeature_mat.append(feature)
perlabel_mat.append(keys)
feature_mat.append(perfeature_mat)
label_mat.append(perlabel_mat)
# These are the final product.
bigfeature_mat=np.vstack(feature_mat)
biglabel_mat=np.hstack(label_mat)
# As usual. We need to convert them into double type for Shogun.
bigfeature_mat=np.array(bigfeature_mat, dtype='double')
biglabel_mat=np.array(biglabel_mat, dtype='double')
#shogun works in a way in which columns are samples and rows are features.
#Hence we need to transpose the observation matrix
obs_matrix=bigfeature_mat.T
#convert the observation matrix and the labels into Shogun RealFeatures and MulticlassLabels structures resp. .
sg_features=RealFeatures(obs_matrix)
sg_labels=MulticlassLabels(biglabel_mat)
#initialize a simple ANN in Shogun with one hidden layer.
layers=DynamicObjectArray()
layers.append_element(NeuralInputLayer(65))
layers.append_element(NeuralLogisticLayer(65))
layers.append_element(NeuralSoftmaxLayer(32))
net=NeuralNetwork(layers)
net.quick_connect()
net.initialize()
net.io.set_loglevel(MSG_INFO)
net.l1_coefficient=3e-4
net.epsilon = 1e-6
net.max_num_epochs = 600
net.set_labels(sg_labels)
net.train(sg_features)
return net
The errors:
AttributeError Traceback (most recent call last)
<ipython-input-28-30225c91fe73> in <module>()
----> 1 net=get_ann(data_map)
<ipython-input-27-809f097ce563> in get_ann(data_map)
59 net=NeuralNetwork(layers)
60 net.quick_connect()
---> 61 net.initialize()
62
63 net.io.set_loglevel(MSG_INFO)
AttributeError: 'NeuralNetwork' object has no attribute 'initialize'
Platform used: Ubuntu 14.04, Python 2.7, opencv-2.4.9, iPython notebook and shogun toolbox.
Can any one please help me in resolving this error? Thanks in advance.
The other code samples are as follows, which have been executed before the above code.
from modshogun import *
def get_vstacked_data(path):
filenames=np.array(get_imlist(path))
#read the image
#convert the image into grayscale.
#change its data-type to double.
#flatten it
vmat=[]
for i in range(filenames[0].shape[0]):
temp=cv2.imread(filenames[0][i])
temp=cv2.cvtColor(temp, cv2.COLOR_BGR2GRAY)
temp=cv2.equalizeHist(temp)
temp=np.array(temp, dtype='double')
temp=temp.flatten()
vmat.append(temp)
vmat=np.vstack(vmat)
return vmat
def get_svm():
#set path for positive training images
path_train='/home/sagar/resized/'
pos_trainmat=get_vstacked_data(path_train)
#set path for negative training images
path_train='/home/sagar/rezize/'
neg_trainmat=get_vstacked_data(path_train)
#form the observation matrix
obs_matrix=np.vstack([pos_trainmat, neg_trainmat])
#shogun works in a way in which columns are samples and rows are features.
#Hence we need to transpose the observation matrix
obs_matrix=obs_matrix.T
#get the labels. Positive training images are marked with +1 and negative with -1
labels=np.ones(obs_matrix.shape[1])
labels[pos_trainmat.shape[0]:obs_matrix.shape[1]]*=-1
#convert the observation matrix and the labels into Shogun RealFeatures and BinaryLabels structures resp. .
sg_features=RealFeatures(obs_matrix)
sg_labels=BinaryLabels(labels)
#Initialise a basic LibSVM in Shogun.
width=2
#kernel=GaussianKernel(sg_features, sg_features, width)
kernel=LinearKernel(sg_features, sg_features)
C=1.0
svm=LibSVM(C, kernel, sg_labels)
_=svm.train()
_=svm.apply(sg_features)
return svm
ocr classification
def validate_ann(cnt):
rect=cv2.minAreaRect(cnt)
box=cv2.cv.BoxPoints(rect)
box=np.int0(box)
output=False
width=rect[1][0]
height=rect[1][1]
if ((width!=0) & (height!=0)):
if (((height/width>1.12) & (height>width)) | ((width/height>1.12) & (width>height))):
if((height*width<1700) & (height*width>100)):
if((max(width, height)<64) & (max(width, height)>35)):
output=True
return output
Probably there are some method deprecations issues with Shogun.
Try to replace:
net.initialize()
With:
net.initialize_neural_network()