Force Google Vision API to detect dates / numbers - computer-vision

I'm trying to detect handwritten dates using the Google Vision API. Do you know if it is possible to force it to detect dates (DD/MM/YYYY), or at least numbers only to increase reliablity?
The function I use, takes an Image as np.array as input:
def detect_handwritten_text(img):
"""Recognizes characters using the Google Cloud Vision API.
Args:
img(np.array) = The Image on which to apply the OCR.
Returns:
The recognized content of img as string.
"""
from google.cloud import vision_v1p3beta1 as vision
client = vision.ImageAnnotatorClient()
# Transform np.array image format into vision api readable byte format
sucess, encoded_image = cv.imencode('.png', img)
content = encoded_image.tobytes()
# Configure client to detect handwriting and load picture
image = vision.types.Image(content=content)
image_context = vision.types.ImageContext(language_hints=['en-t-i0-handwrit'])
response = client.document_text_detection(image=image, image_context=image_context)
return response.full_text_annotation.text

After ImageAnnotatorClient.DetectDocumentText(your image), you could iterate over Blocks and words inside each block, and try to match a regular expression over each word to find dates and numbers.

Related

Read, process and show the pixels in .EXR format images

I want to read the exr file format images and see the pixel intensities in the corresponding location. And also wanted to stack them together to give them into a neural network. How can I do the normal image processing on these kind of formats? Please help me in doing this!
I have tried this code using OpenEXR file but unable to proceed further.
import OpenEXR
file = OpenEXR.InputFile('file_name.exr')
I am expected to see the normal image processing tools like
file.size()
file.show()
file.write('another format')
file.min()
file.extract_channels()
file.append('another exr file')
OpenEXR seems to be lacking the fancy image processing features such as displaying images or saving the image to a different format. For this I would suggest you using OpenCV, which is full of image processing features.
What you may need to do is:
Read exr using OpenEXR only, then extract channels and convert them to numpy arrays as rCh = np.asarray(rCh, dtype=np.uint8)
Create a RGB image from these numpy arrays as img_rgb = cv2.merge([b, g, r]).
Use OpenCV functions for your listed operations:
Size: img_rgb.shape
Show: cv2.imshow(img_rgb)
Write: cv2.imwrite("path/to/file.jpg", img_rgb)
Min: np.min(b), np.min(g), np.min(r)
Extract channels: b, g, r = cv2.split(img_rgb)
There is an example on the OpenEXR webpage:
import sys
import array
import OpenEXR
import Imath
if len(sys.argv) != 3:
print "usage: exrnormalize.py exr-input-file exr-output-file"
sys.exit(1)
# Open the input file
file = OpenEXR.InputFile(sys.argv[1])
# Compute the size
dw = file.header()['dataWindow']
sz = (dw.max.x - dw.min.x + 1, dw.max.y - dw.min.y + 1)
# Read the three color channels as 32-bit floats
FLOAT = Imath.PixelType(Imath.PixelType.FLOAT)
(R,G,B) = [array.array('f', file.channel(Chan, FLOAT)).tolist() for Chan in ("R", "G", "B") ]
After this, you should have three arrays of floating point data, one per channel. You could easily convert these to numpy arrays and proceed with opencv as user #ZdaR suggests.

Pytesseract OCR is not recognizing small words because of horizontal line

I've just started exploring pytesseract. Here's the issue I'm facing:
I have the following input image.
Now, when I try running OCR on this, I get the following output:
Thanks for signing up. Now you too
can pick your favorite pillows
Option AB
After trying it out on multiple image samples, I can safely conclude the following:
It is not a non-dict word penalty.
It is omitting words which are short. Almost as though it has a min-width that is taking effect, and is veto-ing any line with width lesser than that.
It only happens if the input image has the bounding rectangle around it. If I remove that from the input image I get the correct output.
i.e. on the following image:
I get the following output
Thanks for signing up. Now you too
can pick your favorite pillows
Option
Opon
Option AB
I'm unable to figure out where am I going wrong. Here's the code I'm using:
from PIL import Image
import pytesseract
import argparse
import cv2
image = cv2.imread('testImage.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.threshold(gray, 240, 255, cv2.THRESH_BINARY)[1]
filename = "intermediate.png"
cv2.imwrite(filename, gray)
im = Image.open(filename)
text = pytesseract.image_to_string(im)
print(text)
I've tried tinkering around with some of the config parameters (crunch_del_min_width, language_model_min_compound_length, and a few others) too, but nothing helped.
Solution found here: Pytesseract OCR multiple config options
My code was originally like yours (line 1) and I was encountering the same error. What worked for me was setting psm = 10 in the config param to allow single character recognition.
Code sometimes returning None:
line 1 : text = pytesseract.image_to_string(cropped)
Added code on the next line:
line 2 : text = text if text else pytesseract.image_to_string(cropped, config='--psm 10')
The first line will attempt to extract sentences. If it succeeds, the second line keeps the value the same. However, if it returns None, it will look for single characters (allowing smaller words to be outputted).
Alternatively, if you just wanted to catch small words:
line 1 : text = pytesseract.image_to_string(cropped, config='--psm 10')

Separate Positive and Negative Samples for SVM Custom Object Detector

I am trying to train a Custom Object Detector by using the HOG+SVM method on OpenCV.
I have managed to extract HOG features from my positive and negative samples using the below line of code:
import cv2
hog = cv2.HOGDescriptor()
def poshoggify():
for i in range(1,20):
image = cv2.imread("/Users/munirmalik/cvprojek/cod/pos/" + str(i)+ ".jpg")
(winW, winH) = (500, 500)
for resized in pyramid(image, scale=1.5):
# loop over the sliding window for each layer of the pyramid
for (x, y, window) in sliding_window(resized, stepSize=32, windowSize=(winW, winH)):
# if the window does not meet our desired window size, ignore it
if window.shape[0] != winH or window.shape[1] != winW:
continue
img_pos = hog.compute(image)
np.savetxt('posdata.txt',img_pos)
return img_pos
And the equivalent function for the negative samples.
How do I format the data in such a way that the SVM knows which is positive and which is negative?
Furthermore, how do I translate this training to the "test" of detecting the desired objects through my webcam?
How do I format the data in such a way that the SVM knows which is positive and which is negative?
You would now create another list called labels which would store the class value associated with a corresponding image. For example, if you have a training set of features that looks like this:
features = [pos_features1, pos_features2, neg_features1, neg_features2, neg_features3, neg_features4]
you would have a corresponding labels class like
labels = [1, 1, 0, 0, 0, 0]
You would then feed this to a classifier like so:
clf=LinearSVC(C=1.0, class_weight='balanced')
clf.fit(features,labels)
Furthermore, how do I translate this training to the "test" of detecting the desired objects through my webcam?
Before training, you should have split your labelled dataset (groundtruth) into training and testing datasets. You can do this using skilearns KFold module.

Doing OCR to identify text written on trucks/cars or other vehicles

I am new to the world of Computer Vision.
I am trying to use Tesseract to detect numbers written on the side of trucks.
So for this example, I would like to see CMA CGM as the output.
I fed this image to Tesseract via command line
tesseract image.JPG out -psm 6
but it yielded a blank file.
Then I read the documentation of Tesserocr (python wrapper of Tesseract) and tried the following code
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
and again it was not able to read any characters in the image.
My question is how should I go about solving this problem? ( I am not looking for a ready made code, but approach on how to go about solving this problem).
Would I need to train tesseract with sample images or can I just write code using existing libraries to somehow detect the co-ordinates of the truck and try to do OCR only within the boundaries of the truck?
Tesseract expects document-only images, but you have non-document objects in your image. You need a sophisticated segmentation(then probably some image processing) process before feeding it to Tesseract-OCR.
I have a three-step solution
Take the part of the image you want to recognize
Apply Gaussian-blur
Apply simple-thresholding
You can use a range to get the part of the image.
For instance, if you select the
height range as: from (int(h/4) + 40 to int(h/2)-20)
width range as: from int(w/2) to int((w*3)/4)
Result
Take Part
Gaussian
Threshold
Pytesseract
CMA CGM
Code:
import cv2
import pytesseract
img = cv2.imread('YizU3.jpg')
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = gry[int(h/4) + 40:int(h/2)-20, int(w/2):int((w*3)/4)]
blr = cv2.GaussianBlur(gry, (3, 3), 0)
thr = cv2.threshold(gry, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

Tensorflow return similar images

I want to use Google's Tensorflow to return similar images to an input image.
I have installed Tensorflow from http://www.tensorflow.org (using PIP installation - pip and python 2.7) on Ubuntu14.04 on a virtual machine CPU.
I have downloaded the trained model Inception-V3 (inception-2015-12-05.tgz) from http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz that is trained on ImageNet Large Visual Recognition Challenge using the data from 2012, but I think it has both the Neural network and the classifier inside it (as the task there was to predict the category). I have also downloaded the file classify_image.py that classifies an image in 1 of the 1000 classes in the model.
So I have a random image image.jpg that I an running to test the model. when I run the command:
python /home/amit/classify_image.py --image_file=/home/amit/image.jpg
I get the below output: (Classification is done using softmax)
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 3
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 3
trench coat (score = 0.62218)
overskirt (score = 0.18911)
cloak (score = 0.07508)
velvet (score = 0.02383)
hoopskirt, crinoline (score = 0.01286)
Now, the task at hand is to find images that are similar to the input image (image.jpg) out of a database of 60,000 images (jpg format, and kept in a folder at /home/amit/images). I believe this can be done by removing the final classification layer from the inception-v3 model, and using the feature set of the input image to find cosine distance from the feature set all the 60,000 images, and we can return the images having less distance (cos 0 = 1)
Please suggest me the way forward for this problem and how do I do this using Python API.
I think I found an answer to my question:
In the file classify_image.py that classifies the image using the pre trained model (NN + classifier), I made the below mentioned changes (statements with #ADDED written next to them):
def run_inference_on_image(image):
"""Runs inference on an image.
Args:
image: Image file name.
Returns:
Nothing
"""
if not gfile.Exists(image):
tf.logging.fatal('File does not exist %s', image)
image_data = gfile.FastGFile(image, 'rb').read()
# Creates graph from saved GraphDef.
create_graph()
with tf.Session() as sess:
# Some useful tensors:
# 'softmax:0': A tensor containing the normalized prediction across
# 1000 labels.
# 'pool_3:0': A tensor containing the next-to-last layer containing 2048
# float description of the image.
# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
# encoding of the image.
# Runs the softmax tensor by feeding the image_data as input to the graph.
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
feature_tensor = sess.graph.get_tensor_by_name('pool_3:0') #ADDED
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)
feature_set = sess.run(feature_tensor,
{'DecodeJpeg/contents:0': image_data}) #ADDED
feature_set = np.squeeze(feature_set) #ADDED
print(feature_set) #ADDED
# Creates node ID --> English string lookup.
node_lookup = NodeLookup()
top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
I ran the pool_3:0 tensor by feeding in the image_data to it. Please let me know if I am doing a mistake. If this is correct, I believe we can use this tensor for further calculations.
Tensorflow now has a nice tutorial on how to get the activations before the final layer and retrain a new classification layer with different categories:
https://www.tensorflow.org/versions/master/how_tos/image_retraining/
The example code:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py
In your case, yes, you can get the activations from pool_3 the layer below the softmax layer (or the so-called bottlenecks) and send them to other operations as input:
Finally, about finding similar images, I don't think imagenet's bottleneck activations are very pertinent representation for image search. You could consider to use an autoencoder network with direct image inputs.
(source: deeplearning4j.org)
Your problem sounds similar to this visual search project