Splitting text and background as preprocess of OCR (Tesseract) - c++

I am applying OCR against text in TV footage. (I am using Tesseact 3.x w/ C++)
I am trying to split text and background part as a preprocessing of OCR.
With usual footage, text and background is highly contrasted (such as white against black) so that modifying gamma would do the job.
However, this attached image (yellow text with background of orange/red sky) is giving me hard time to do preprocessing.
What would be a good way to split this yellow text from background?

Below is a simple solution by using Python 2.7, OpenCV 3.2.0 and Tesseract 4.0.0a. Convert Python to C++ for OpenCV should be not difficult, then call tesseract API to perform OCR.
import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
def show(title, img, color=True):
if color:
plt.imshow(img[:,:,::-1]), plt.title(title), plt.show()
else:
plt.imshow(img, cmap='gray'), plt.title(title), plt.show()
def ocr(img):
# I used a version of OpenCV with Tesseract binding. Modes set to:
# Page Segmentation mode (PSmode) = 11 (defualt = 3)
# OCR Enginer Mode (OEM) = 3 (defualt = 3)
tesser = cv2.text.OCRTesseract_create('C:/Program Files/Tesseract 4.0.0/tessdata/','eng', \
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',3,3)
retval = tesser.run(img, 0) # return text string type
print 'OCR Output: ' + retval
img = cv2.imread('./imagesStackoverflow/yellow_text.png')
show('original', img)
# apply GaussianBlur to smooth image, then threshholds yellow to white (255,255, 255)
# and sets the rest to black(0,0,0)
img = cv2.GaussianBlur(img,(5,5), 1) # smooth image
mask = cv2.inRange(img,(40,180,200),(70,220,240)) # filter out yellow color range, low and high range
show('mask', mask, False)
# invert the image to have text black-in-white
res = 255 - mask
show('result', res, False)
# pass to tesseract to perform OCR
ocr(res)
Processed Images and OCR Output (see last line in image):
Hope this help.

Related

Why doesn't pytesseract recognize any text in this image?

I have this image input image on which I am attempting to apply text detection and ocr,
however even after preprocessing (binary thresholding etc) pytesseract doesn't return any output. The purpose of text detection is to improve the ocr output, I'm not too concerned with obtaining bounding boxes.
Here is my code below:
image = cv2.imread('image.jpg')
grey = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret,thresh1 = cv2.threshold(grey,127,255,cv2.THRESH_BINARY)
image = pytesseract.image_to_data(thresh1, output_type=Output.DICT)
image = cv2.bitwise_not(image)
Inspecting the results there is none to nonsensical output, is there anyway to improve this?
Try this code:
import pytesseract
import cv2
image = cv2.imread('ccl6t.png')
pytesseract.pytesseract.tesseract_cmd = r'k:\Tesseract\tesseract.exe' #need change!
grey = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret,thresh1 = cv2.threshold(grey,127,255,cv2.THRESH_BINARY_INV)
cv2.imwrite('tresh.png', thresh1)
words = pytesseract.image_to_data(thresh1, lang='eng',config='--psm 3 --oem 1 ')
print(str(words))

segmentation of overlapping cells

The following python script should split overlapping cells apart which does work quite good. The problem is now that it also splits some of the cells apart which don't overlap with other cells. To make things clear to you i'll add my input image and the output image.
The input:input image
The output:
output image
Output image where I marked two "bad" segmented cells:Output image with marked errors
Thresholded image: Thresholded image
Does someone have an idea how to avoid this problem or is the whole approach not good enough to process these kind of images?
I am using the following piece of code to segment the cells:
from skimage.feature import peak_local_max
from skimage.morphology import watershed
from scipy import ndimage
import numpy as np
import cv2
# load the image and perform pyramid mean shift filtering
# to aid the thresholding step
image = cv2.imread('C:/Users/Root/Desktop/image13.jpg')
shifted = cv2.pyrMeanShiftFiltering(image, 41, 51)
# convert the mean shift image to grayscale, then apply
# Otsu's thresholding
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255,
cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
im = gray.copy()
D = ndimage.distance_transform_edt(thresh)
localMax = peak_local_max(D, indices=False, min_distance=3,
labels=thresh)
# perform a connected component analysis on the local peaks,
# using 8-connectivity, then apply the Watershed algorithm
markers = ndimage.label(localMax, structure=np.ones((3, 3)))[0]
labels = watershed(-D, markers, mask=thresh)
print("[INFO] {} unique segments found".format(len(np.unique(labels)) - 1))
conts=[]
for label in np.unique(labels):
# if the label is zero, we are examining the 'background'
# so simply ignore it
if label == 0:
continue
# otherwise, allocate memory for the label region and draw
# it on the mask
mask = np.zeros(gray.shape, dtype="uint8")
mask[labels == label] = 255
# detect contours in the mask and grab the largest one
cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)[-2]
c = max(cnts, key=cv2.contourArea)
rect = cv2.minAreaRect(c)
box = cv2.boxPoints(rect)
box = np.int0(box)
if cv2.contourArea(c) > 150:
#cv2.drawContours(image,c,-1,(0,255,0))
cv2.drawContours(image,[box],-1,(0,255,0))
cv2.imshow("output", image)
cv2.waitKey()

Doing OCR to identify text written on trucks/cars or other vehicles

I am new to the world of Computer Vision.
I am trying to use Tesseract to detect numbers written on the side of trucks.
So for this example, I would like to see CMA CGM as the output.
I fed this image to Tesseract via command line
tesseract image.JPG out -psm 6
but it yielded a blank file.
Then I read the documentation of Tesserocr (python wrapper of Tesseract) and tried the following code
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
and again it was not able to read any characters in the image.
My question is how should I go about solving this problem? ( I am not looking for a ready made code, but approach on how to go about solving this problem).
Would I need to train tesseract with sample images or can I just write code using existing libraries to somehow detect the co-ordinates of the truck and try to do OCR only within the boundaries of the truck?
Tesseract expects document-only images, but you have non-document objects in your image. You need a sophisticated segmentation(then probably some image processing) process before feeding it to Tesseract-OCR.
I have a three-step solution
Take the part of the image you want to recognize
Apply Gaussian-blur
Apply simple-thresholding
You can use a range to get the part of the image.
For instance, if you select the
height range as: from (int(h/4) + 40 to int(h/2)-20)
width range as: from int(w/2) to int((w*3)/4)
Result
Take Part
Gaussian
Threshold
Pytesseract
CMA CGM
Code:
import cv2
import pytesseract
img = cv2.imread('YizU3.jpg')
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = gry[int(h/4) + 40:int(h/2)-20, int(w/2):int((w*3)/4)]
blr = cv2.GaussianBlur(gry, (3, 3), 0)
thr = cv2.threshold(gry, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

Poor Performance and Strange Array Behavior when run on Linux

So I am working on a script to do some video processing. It will read a video file searching for red dots that are a certain size then find the center of each and return the x/y coordinates. Initially I had it working great on my Windows Machine, so I sent it over to the raspberry pi to see if i would encounter issues, and boy did I.
On Windows the script would run in real time, completing at the same time as the video. On the Raspberry it is slowwwwwwww. Also I noticed when I looked into the structure of countours, there is a huge array of 0's first, before my x/y coordinates array. I have no idea what is creating this, but it doesn't happen on the windows box.
I have same version of python and opencv installed on both boxes, the only difference is numpy 1.11 on windows and numpy 1.12 on raspberry. Note, I had to change np.mean(contours[?]) to 1 to skip the initial array of 0's. What have I done wrong?
Here's a video I made for testing purposes if needed:
http://www.foxcreekwinery.com/video.mp4
import numpy as np
import cv2
def vidToPoints():
cap = cv2.VideoCapture('video.mp4')
while(cap.isOpened()):
ret, image = cap.read()
if (ret):
cv2.imshow('frame',image)
if cv2.waitKey(1) == ord('q'):
break
# save frame as image
cv2.imwrite('frame.jpg',image)
# load the image
image = cv2.imread('frame.jpg')
# define the list of boundaries
boundaries = [
([0, 0, 150], [90, 90, 255])
]
# loop over the boundaries
for (lower, upper) in boundaries:
# create NumPy arrays from the boundaries
lower = np.array(lower, dtype = "uint8")
upper = np.array(upper, dtype = "uint8")
# find the colors within the specified boundaries
mask = cv2.inRange(image, lower, upper)
if (50 > cv2.countNonZero(mask) > 10):
#find contour
contours = cv2.findContours(mask, 0, 1)
#average countour list to find center
avg = np.mean(contours[1],axis=1)
x = int(round(avg[0,0,0]))
y = int(round(avg[0,0,1]))
print [x,y]
print cv2.countNonZero(mask)
for l in range(5):
cap.grab()
else:
break
cap.release()
cv2.destroyAllWindows()
vidToPoints()

cannot read in image in colour in opencv python

I have just started to use Opencv using python in windows(PyCharm IDE).
I tried to read a color image. But it got displayed in Grayscale. So I tried to convert it as below:
import cv2
img = cv2.imread('C:\Ai.jpg', 0)
b,g,r = cv2.split(img)
rgb = cv2.merge([r,g,b])
cv2.imshow('image', img)
cv2.imshow('rgb image',rgb)
cv2.waitKey(0)
cv2.destroyAllWindows()
But i am getting an error:
"b, g, r = cv2.split(img) ValueError: need more than 1 value to
unpack"
Can you guys please help me out?
Thanks in advance.
There is a problem in the second line of your code img = cv2.imread('C:\Ai.jpg', 0), as per the documentation, 0 value corresponds to cv2.IMREAD_GRAYSCALE, This is the reason why you are getting a grayscale image. You may want to change it to 1 if you want to load it in RGB color space or -1 if you want to include any other channel like alpha channel which is encoded along with the image.
And b,g,r = cv2.split(img) was raising an error because img at that point of time is a grayscale image which had only one channel, and it is impossible to split a 1 channel image to 3 respective channels.
Your final snippet may look like this:
import cv2
# Reading the image in RGB mode
img = cv2.imread('C:\Ai.jpg', 1)
# No need of following lines:
# b,g,r = cv2.split(img)
# rgb = cv2.merge([r,g,b])
# cv2.imshow('rgb image',rgb)
# Displaying the image
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Try this solution
Read and convert image into RGB format:
If you have a color image and reading it using OpenCV. First, convert it in RGB colour format
image = cv2.imread(C:\Ai.jpg') #cv2 reading image in BGR
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) #convert it into RGB format
To display it we can use cv2.imshow, matplotlib or PIL as follows
import matplotlib.pyplot as plt
%matplotlib inline
from PIL import Image
Now print using matplotlib:
plt.imshow(image)
print using PIL
Image.fromarray(image)