I am learning python programming and machine learning for my academic project and i found interest in number plate recognition.
By executing below code, i am getting error, which is mentioned below after the code
values=
['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','J','K','L','M','N','P','R','S','T','U','V','W','X','Z']
keys=range(32)
data_map=dict((keys, values))
def get_ann(data_map):
feature_mat=[]
label_mat=[]
for keys in data_map:
path_train="/home/sagar/Project data set/ANPR/ann/%s"%data_map[keys]
filenames=get_imlist(path_train)
perfeature_mat=[]
perlabel_mat=[]
for image in filenames[0]:
raw_image=cv2.imread(image)
raw_image=cv2.cvtColor(raw_image, cv2.COLOR_BGR2GRAY)
#resize the image into 5 cols(width) and 10 rows(height)
raw_image=cv2.resize(raw_image,(5,10), interpolation=cv2.INTER_AREA)
#Do a hard thresholding.
_,th2=cv2.threshold(raw_image, 70, 255, cv2.THRESH_BINARY)
#generate features
horz_hist=np.sum(th2==255, axis=0)
vert_hist=np.sum(th2==255, axis=1)
sample=th2.flatten()
#concatenate these features together
feature=np.concatenate([horz_hist, vert_hist, sample])
# append these features together along with their respective labels
perfeature_mat.append(feature)
perlabel_mat.append(keys)
feature_mat.append(perfeature_mat)
label_mat.append(perlabel_mat)
# These are the final product.
bigfeature_mat=np.vstack(feature_mat)
biglabel_mat=np.hstack(label_mat)
# As usual. We need to convert them into double type for Shogun.
bigfeature_mat=np.array(bigfeature_mat, dtype='double')
biglabel_mat=np.array(biglabel_mat, dtype='double')
#shogun works in a way in which columns are samples and rows are features.
#Hence we need to transpose the observation matrix
obs_matrix=bigfeature_mat.T
#convert the observation matrix and the labels into Shogun RealFeatures and MulticlassLabels structures resp. .
sg_features=RealFeatures(obs_matrix)
sg_labels=MulticlassLabels(biglabel_mat)
#initialize a simple ANN in Shogun with one hidden layer.
layers=DynamicObjectArray()
layers.append_element(NeuralInputLayer(65))
layers.append_element(NeuralLogisticLayer(65))
layers.append_element(NeuralSoftmaxLayer(32))
net=NeuralNetwork(layers)
net.quick_connect()
net.initialize()
net.io.set_loglevel(MSG_INFO)
net.l1_coefficient=3e-4
net.epsilon = 1e-6
net.max_num_epochs = 600
net.set_labels(sg_labels)
net.train(sg_features)
return net
The errors:
AttributeError Traceback (most recent call last)
<ipython-input-28-30225c91fe73> in <module>()
----> 1 net=get_ann(data_map)
<ipython-input-27-809f097ce563> in get_ann(data_map)
59 net=NeuralNetwork(layers)
60 net.quick_connect()
---> 61 net.initialize()
62
63 net.io.set_loglevel(MSG_INFO)
AttributeError: 'NeuralNetwork' object has no attribute 'initialize'
Platform used: Ubuntu 14.04, Python 2.7, opencv-2.4.9, iPython notebook and shogun toolbox.
Can any one please help me in resolving this error? Thanks in advance.
The other code samples are as follows, which have been executed before the above code.
from modshogun import *
def get_vstacked_data(path):
filenames=np.array(get_imlist(path))
#read the image
#convert the image into grayscale.
#change its data-type to double.
#flatten it
vmat=[]
for i in range(filenames[0].shape[0]):
temp=cv2.imread(filenames[0][i])
temp=cv2.cvtColor(temp, cv2.COLOR_BGR2GRAY)
temp=cv2.equalizeHist(temp)
temp=np.array(temp, dtype='double')
temp=temp.flatten()
vmat.append(temp)
vmat=np.vstack(vmat)
return vmat
def get_svm():
#set path for positive training images
path_train='/home/sagar/resized/'
pos_trainmat=get_vstacked_data(path_train)
#set path for negative training images
path_train='/home/sagar/rezize/'
neg_trainmat=get_vstacked_data(path_train)
#form the observation matrix
obs_matrix=np.vstack([pos_trainmat, neg_trainmat])
#shogun works in a way in which columns are samples and rows are features.
#Hence we need to transpose the observation matrix
obs_matrix=obs_matrix.T
#get the labels. Positive training images are marked with +1 and negative with -1
labels=np.ones(obs_matrix.shape[1])
labels[pos_trainmat.shape[0]:obs_matrix.shape[1]]*=-1
#convert the observation matrix and the labels into Shogun RealFeatures and BinaryLabels structures resp. .
sg_features=RealFeatures(obs_matrix)
sg_labels=BinaryLabels(labels)
#Initialise a basic LibSVM in Shogun.
width=2
#kernel=GaussianKernel(sg_features, sg_features, width)
kernel=LinearKernel(sg_features, sg_features)
C=1.0
svm=LibSVM(C, kernel, sg_labels)
_=svm.train()
_=svm.apply(sg_features)
return svm
ocr classification
def validate_ann(cnt):
rect=cv2.minAreaRect(cnt)
box=cv2.cv.BoxPoints(rect)
box=np.int0(box)
output=False
width=rect[1][0]
height=rect[1][1]
if ((width!=0) & (height!=0)):
if (((height/width>1.12) & (height>width)) | ((width/height>1.12) & (width>height))):
if((height*width<1700) & (height*width>100)):
if((max(width, height)<64) & (max(width, height)>35)):
output=True
return output
Probably there are some method deprecations issues with Shogun.
Try to replace:
net.initialize()
With:
net.initialize_neural_network()
Related
I want to read the exr file format images and see the pixel intensities in the corresponding location. And also wanted to stack them together to give them into a neural network. How can I do the normal image processing on these kind of formats? Please help me in doing this!
I have tried this code using OpenEXR file but unable to proceed further.
import OpenEXR
file = OpenEXR.InputFile('file_name.exr')
I am expected to see the normal image processing tools like
file.size()
file.show()
file.write('another format')
file.min()
file.extract_channels()
file.append('another exr file')
OpenEXR seems to be lacking the fancy image processing features such as displaying images or saving the image to a different format. For this I would suggest you using OpenCV, which is full of image processing features.
What you may need to do is:
Read exr using OpenEXR only, then extract channels and convert them to numpy arrays as rCh = np.asarray(rCh, dtype=np.uint8)
Create a RGB image from these numpy arrays as img_rgb = cv2.merge([b, g, r]).
Use OpenCV functions for your listed operations:
Size: img_rgb.shape
Show: cv2.imshow(img_rgb)
Write: cv2.imwrite("path/to/file.jpg", img_rgb)
Min: np.min(b), np.min(g), np.min(r)
Extract channels: b, g, r = cv2.split(img_rgb)
There is an example on the OpenEXR webpage:
import sys
import array
import OpenEXR
import Imath
if len(sys.argv) != 3:
print "usage: exrnormalize.py exr-input-file exr-output-file"
sys.exit(1)
# Open the input file
file = OpenEXR.InputFile(sys.argv[1])
# Compute the size
dw = file.header()['dataWindow']
sz = (dw.max.x - dw.min.x + 1, dw.max.y - dw.min.y + 1)
# Read the three color channels as 32-bit floats
FLOAT = Imath.PixelType(Imath.PixelType.FLOAT)
(R,G,B) = [array.array('f', file.channel(Chan, FLOAT)).tolist() for Chan in ("R", "G", "B") ]
After this, you should have three arrays of floating point data, one per channel. You could easily convert these to numpy arrays and proceed with opencv as user #ZdaR suggests.
I'm interested in augmenting my dataset with random image transformations. I'm using Keras ImageDataGenerator, and I'm getting the following error when trying to apply random_transform to a single image:
--> x = apply_transform(x, transform matrix, img_channel_axis, fill_mode, cval)
>>> RuntimeError: affine matrix has wrong number of rows.
I found the source code for the ImageDataGenerator here. However, I'm not sure how to debug the runtime error. Below is the code I have:
from keras.preprocessing.image import img_to_array, load_img
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.inception_v3 import preprocess_input
image_path = './figures/zebra.jpg'
#data augmentation
train_datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
print "\nloading image..."
image = load_img(image_path, target_size=(299, 299))
image = img_to_array(image)
image = np.expand_dims(image, axis=0) # 1 x input_shape
image = preprocess_input(image)
train_datagen.fit(image)
image = train_datagen.random_transform(image)
The error occurs at the last line when calling random_transform.
The problem is that random_transform expects a 3D-array.
See the docstring:
def random_transform(self, x, seed=None):
"""Randomly augment a single image tensor.
# Arguments
x: 3D tensor, single image.
seed: random seed.
# Returns
A randomly transformed version of the input (same shape).
"""
So you'll need to call it before np.expand_dims.
I am new to the world of Computer Vision.
I am trying to use Tesseract to detect numbers written on the side of trucks.
So for this example, I would like to see CMA CGM as the output.
I fed this image to Tesseract via command line
tesseract image.JPG out -psm 6
but it yielded a blank file.
Then I read the documentation of Tesserocr (python wrapper of Tesseract) and tried the following code
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
and again it was not able to read any characters in the image.
My question is how should I go about solving this problem? ( I am not looking for a ready made code, but approach on how to go about solving this problem).
Would I need to train tesseract with sample images or can I just write code using existing libraries to somehow detect the co-ordinates of the truck and try to do OCR only within the boundaries of the truck?
Tesseract expects document-only images, but you have non-document objects in your image. You need a sophisticated segmentation(then probably some image processing) process before feeding it to Tesseract-OCR.
I have a three-step solution
Take the part of the image you want to recognize
Apply Gaussian-blur
Apply simple-thresholding
You can use a range to get the part of the image.
For instance, if you select the
height range as: from (int(h/4) + 40 to int(h/2)-20)
width range as: from int(w/2) to int((w*3)/4)
Result
Take Part
Gaussian
Threshold
Pytesseract
CMA CGM
Code:
import cv2
import pytesseract
img = cv2.imread('YizU3.jpg')
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = gry[int(h/4) + 40:int(h/2)-20, int(w/2):int((w*3)/4)]
blr = cv2.GaussianBlur(gry, (3, 3), 0)
thr = cv2.threshold(gry, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)
I want to use Google's Tensorflow to return similar images to an input image.
I have installed Tensorflow from http://www.tensorflow.org (using PIP installation - pip and python 2.7) on Ubuntu14.04 on a virtual machine CPU.
I have downloaded the trained model Inception-V3 (inception-2015-12-05.tgz) from http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz that is trained on ImageNet Large Visual Recognition Challenge using the data from 2012, but I think it has both the Neural network and the classifier inside it (as the task there was to predict the category). I have also downloaded the file classify_image.py that classifies an image in 1 of the 1000 classes in the model.
So I have a random image image.jpg that I an running to test the model. when I run the command:
python /home/amit/classify_image.py --image_file=/home/amit/image.jpg
I get the below output: (Classification is done using softmax)
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 3
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 3
trench coat (score = 0.62218)
overskirt (score = 0.18911)
cloak (score = 0.07508)
velvet (score = 0.02383)
hoopskirt, crinoline (score = 0.01286)
Now, the task at hand is to find images that are similar to the input image (image.jpg) out of a database of 60,000 images (jpg format, and kept in a folder at /home/amit/images). I believe this can be done by removing the final classification layer from the inception-v3 model, and using the feature set of the input image to find cosine distance from the feature set all the 60,000 images, and we can return the images having less distance (cos 0 = 1)
Please suggest me the way forward for this problem and how do I do this using Python API.
I think I found an answer to my question:
In the file classify_image.py that classifies the image using the pre trained model (NN + classifier), I made the below mentioned changes (statements with #ADDED written next to them):
def run_inference_on_image(image):
"""Runs inference on an image.
Args:
image: Image file name.
Returns:
Nothing
"""
if not gfile.Exists(image):
tf.logging.fatal('File does not exist %s', image)
image_data = gfile.FastGFile(image, 'rb').read()
# Creates graph from saved GraphDef.
create_graph()
with tf.Session() as sess:
# Some useful tensors:
# 'softmax:0': A tensor containing the normalized prediction across
# 1000 labels.
# 'pool_3:0': A tensor containing the next-to-last layer containing 2048
# float description of the image.
# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
# encoding of the image.
# Runs the softmax tensor by feeding the image_data as input to the graph.
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
feature_tensor = sess.graph.get_tensor_by_name('pool_3:0') #ADDED
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)
feature_set = sess.run(feature_tensor,
{'DecodeJpeg/contents:0': image_data}) #ADDED
feature_set = np.squeeze(feature_set) #ADDED
print(feature_set) #ADDED
# Creates node ID --> English string lookup.
node_lookup = NodeLookup()
top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
I ran the pool_3:0 tensor by feeding in the image_data to it. Please let me know if I am doing a mistake. If this is correct, I believe we can use this tensor for further calculations.
Tensorflow now has a nice tutorial on how to get the activations before the final layer and retrain a new classification layer with different categories:
https://www.tensorflow.org/versions/master/how_tos/image_retraining/
The example code:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py
In your case, yes, you can get the activations from pool_3 the layer below the softmax layer (or the so-called bottlenecks) and send them to other operations as input:
Finally, about finding similar images, I don't think imagenet's bottleneck activations are very pertinent representation for image search. You could consider to use an autoencoder network with direct image inputs.
(source: deeplearning4j.org)
Your problem sounds similar to this visual search project
I am trying to identify the type of noise based on that article:
Model selection with Probabilistic (PCA) and Factor Analysis (FA)
I am using scikit-learn-0.14.1.win32-py2.7 on win8 64bit
I know that it refers on version 0.15, however at the version 0.14 documentation it mentions that the score method is available for PCA so I guess it should normally work:
sklearn.decomposition.ProbabilisticPCA
The problem is that no matter which PCA I will use for the *cross_val_score*, I always get a type error message saying that the estimator PCA does not have a score method:
*TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator PCA(copy=True, n_components=None, whiten=False) does not.*
Any ideas why is that happening?
Many thanks in advance
Christos
X has 1000 samples of 40 features
here is a portion of the code:
import numpy as np
import csv
from scipy import linalg
from sklearn.decomposition import PCA, FactorAnalysis
from sklearn.cross_validation import cross_val_score
from sklearn.grid_search import GridSearchCV
from sklearn.covariance import ShrunkCovariance, LedoitWolf
#read in the training data
train_path = '<train data path>/train.csv'
reader = csv.reader(open(train_path,"rb"),delimiter=',')
train = list(reader)
X = np.array(train).astype('float')
n_samples = 1000
n_features = 40
n_components = np.arange(0, n_features, 4)
def compute_scores(X):
pca = PCA()
pca_scores = []
for n in n_components:
pca.n_components = n
pca_scores.append(np.mean(cross_val_score(pca, X, n_jobs=1)))
return pca_scores
pca_scores = compute_scores(X)
n_components_pca = n_components[np.argmax(pca_scores)]
Ok, I think I found the problem. it is not working with PCA, but it does work with PPCA
However, by not providing a cv number the cross_val_score automatically sets 3-fold cross validation
that created 3 sets with sizes 334, 333 and 333 (my initial training set contains 1000 samples)
Since nympy.mean cannot make a comparison between sets with different sizes (334 vs 333), python rises an exception.
thx