I am trying to train a Custom Object Detector by using the HOG+SVM method on OpenCV.
I have managed to extract HOG features from my positive and negative samples using the below line of code:
import cv2
hog = cv2.HOGDescriptor()
def poshoggify():
for i in range(1,20):
image = cv2.imread("/Users/munirmalik/cvprojek/cod/pos/" + str(i)+ ".jpg")
(winW, winH) = (500, 500)
for resized in pyramid(image, scale=1.5):
# loop over the sliding window for each layer of the pyramid
for (x, y, window) in sliding_window(resized, stepSize=32, windowSize=(winW, winH)):
# if the window does not meet our desired window size, ignore it
if window.shape[0] != winH or window.shape[1] != winW:
continue
img_pos = hog.compute(image)
np.savetxt('posdata.txt',img_pos)
return img_pos
And the equivalent function for the negative samples.
How do I format the data in such a way that the SVM knows which is positive and which is negative?
Furthermore, how do I translate this training to the "test" of detecting the desired objects through my webcam?
How do I format the data in such a way that the SVM knows which is positive and which is negative?
You would now create another list called labels which would store the class value associated with a corresponding image. For example, if you have a training set of features that looks like this:
features = [pos_features1, pos_features2, neg_features1, neg_features2, neg_features3, neg_features4]
you would have a corresponding labels class like
labels = [1, 1, 0, 0, 0, 0]
You would then feed this to a classifier like so:
clf=LinearSVC(C=1.0, class_weight='balanced')
clf.fit(features,labels)
Furthermore, how do I translate this training to the "test" of detecting the desired objects through my webcam?
Before training, you should have split your labelled dataset (groundtruth) into training and testing datasets. You can do this using skilearns KFold module.
Related
I have the following sample of handwriting taken with three different writing instruments:
Looking at the writing, I can tell that there is a distinct difference between the first two and the last one. My goal is to determine an approximation of the stroke thickness for each letter, allowing me to group them based on being thin or thick.
So far, I have tried looking into stroke width transform, but I have struggled to translate it to my example.
I am able to preprocess the image such that I am just left with just the contours of the test in question. For example, here is thick from the last line:
I suggest detecting contours with cv::findContours as you are doing and then compare bounding rectangle area and contour area. The thicker writing the greater coefficent (contourArea/boundingRectArea) will be.
This approach will help you. This will calcuate the stroke width.
from skimage.feature import peak_local_max
from skimage import img_as_float
def adaptive_thresholding(image):
output_image = cv2.adaptiveThreshold(image,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,21,2)
return output_image
def stroke_width(image):
dist = cv2.distanceTransform(cv2.subtract(255,image), cv2.DIST_L2, 5)
im = img_as_float(dist)
coordinates = peak_local_max(im, min_distance=15)
pixel_strength = []
for element in coordinates:
x = element[0]
y = element[1]
pixel_strength.append(np.asarray(dist)[x,y])
mean_pixel_strength = np.asarray(pixel_strength).mean()
return mean_pixel_strength
image = cv2.imread('Small3.JPG', 0)
process_image = adaptive_thresholding(image)
stroke_width(process_image)
A python implementation for this might go something like this, using Stroke Width Transform implementation of SWTloc.
Full Disclosure: I am the author of this library.
EDIT : Post v2.0.0
Transforming The Image
import swtloc as swt
imgpath = 'images/path_to_image.jpeg'
swtl = swt.SWTLocalizer(image_paths=imgpath)
swtImgObj = swtl.swtimages[0]
# Perform SWT Transformation with numba engine
swt_mat = swtImgObj.transformImage(auto_canny_sigma=1.0, gaussian_blurr=False,
minimum_stroke_width=3, maximum_stroke_width=50,
maximum_angle_deviation=np.pi/3)
Localize Letters
localized_letters = swtImgObj.localizeLetters()
Plot Histogram of Each Letters Strokes Widths
import seaborn as sns
import matplotlib.pyplot as plt
all_sws = []
for letter_label, letter in localized_letters.items():
all_sws.append(letter.stroke_widths_mean)
sns.displot(all_sws, bins=31)
From the distribution plot, it can be inferred that there might be three fontsize of the text available in the image - [3, 15, 27]
I am new to the world of Computer Vision.
I am trying to use Tesseract to detect numbers written on the side of trucks.
So for this example, I would like to see CMA CGM as the output.
I fed this image to Tesseract via command line
tesseract image.JPG out -psm 6
but it yielded a blank file.
Then I read the documentation of Tesserocr (python wrapper of Tesseract) and tried the following code
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
and again it was not able to read any characters in the image.
My question is how should I go about solving this problem? ( I am not looking for a ready made code, but approach on how to go about solving this problem).
Would I need to train tesseract with sample images or can I just write code using existing libraries to somehow detect the co-ordinates of the truck and try to do OCR only within the boundaries of the truck?
Tesseract expects document-only images, but you have non-document objects in your image. You need a sophisticated segmentation(then probably some image processing) process before feeding it to Tesseract-OCR.
I have a three-step solution
Take the part of the image you want to recognize
Apply Gaussian-blur
Apply simple-thresholding
You can use a range to get the part of the image.
For instance, if you select the
height range as: from (int(h/4) + 40 to int(h/2)-20)
width range as: from int(w/2) to int((w*3)/4)
Result
Take Part
Gaussian
Threshold
Pytesseract
CMA CGM
Code:
import cv2
import pytesseract
img = cv2.imread('YizU3.jpg')
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = gry[int(h/4) + 40:int(h/2)-20, int(w/2):int((w*3)/4)]
blr = cv2.GaussianBlur(gry, (3, 3), 0)
thr = cv2.threshold(gry, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)
I am using python2.7 opencv library to calculate histograms of some images, all of the exact same size (cv2.calchist)
i have a need to do 2 things:
1. calculate the average of multiple images - multiple images who represent a similar object, and therefor i want to have a "representive" histogram of that object (if you have a better idea i am open to suggustions) for future comparisons.
2. store the histogram data in my mongo db for future comparisons (cv2 correlation)
the only code i see rellevant for the question is my histogram_comparison code:
def histogram_comparison(real, fake):
images = [real, fake]
index = []
for image in images:
image = image.decode('base64')
image = np.fromstring(image, dtype=np.uint8)
image = cv2.imdecode(image, 1)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
hist = cv2.calcHist([image], [0, 1, 2], None, [32, 32, 32],
[0, 256, 0, 256, 0, 256])
hist = cv2.normalize(hist).flatten()
index.append(hist)
result_dist = cv2.compareHist(index[0], index[1], cv2.cv.CV_COMP_CORREL)
return round(result_dist, 5)
taken from: http://www.pyimagesearch.com/2014/07/14/3-ways-compare-histograms-using-opencv-python/
i do realize that when using numpy's (or was it scipy?) histograms, there is an easy way to get the bins and average them, but then im not really sure how then to compare between histograms so i would rather stay with opencv
thanks in advance
Since OpenCV (since 2.2) natively uses numpy arrays and since len(images) is constant, you can get avg between all your histograms and stores in mongo by simply:
h, b = np.histogram(images, bins=[0, 256])
db.histograms.insert({hist:(h/len(images)), bins:b })
I do not know if it's exactly what you want, but I hope it helps! See ya!
I have constructed a regression type of neural net (NN) with dropout by Tensorflow. I would like to know if it is possible to find which hidden units are dropped from the previous layer in the output file. Therefore, we could implement the NN results by C++ or Matlab.
The following is an example of Tensorflow model. There are three hidden layer with one output layer. After the 3rd sigmoid layer, there is a dropout with probability equal to 0.9. I would like to know if it is possible to know which hidden units in the 3rd sigmoid layer are dropped.
def multilayer_perceptron(_x, _weights, _biases):
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(_x, _weights['h1']), _biases['b1']))
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2']))
layer_3 = tf.nn.sigmoid(tf.add(tf.matmul(layer_2, _weights['h3']), _biases['b3']))
layer_d = tf.nn.dropout(layer_3, 0.9)
return tf.matmul(layer_d, _weights['out']) + _biases['out']
Thank you very much!
There is a way to get the mask of 0 and 1, and of shape layer_3.get_shape() produced by tf.nn.dropout().
The trick is to give a name to your dropout operation:
layer_d = tf.nn.dropout(layer_3, 0.9, name='my_dropout')
Then you can get the wanted mask through the TensorFlow graph:
graph = tf.get_default_graph()
mask = graph.get_tensor_by_name('my_dropout/Floor:0')
The tensor mask will be of same shape and type as layer_d, and will only have values 0 or 1. 0 corresponds to the dropped neurons.
Simple and idiomatic solution (although possibly slightly slower than Oliver's):
# generate mask
mask = tf.nn.dropout(tf.ones_like(layer),rate)
# apply mask
dropped_layer = layer * mask
When processing an image with text in OpenCV, my opening operation does not result in proper output data. The issue is quite similar to the one described in this article:
http://www.cpe.eng.cmu.ac.th/wp-content/uploads/CPE752_06part2.pdf
What I can see, people suggest to use reconstruction operations. Is there any build-in mechanism in OpenCV or some known library/code that implements this?
Here's my Python3 implementation in analogy to MatLab's imreconstruct algorithm:
import cv2
import numpy as np
def imreconstruct(marker: np.ndarray, mask: np.ndarray, radius: int = 1):
"""Iteratively expand the markers white keeping them limited by the mask during each iteration.
:param marker: Grayscale image where initial seed is white on black background.
:param mask: Grayscale mask where the valid area is white on black background.
:param radius Can be increased to improve expansion speed while causing decreased isolation from nearby areas.
:returns A copy of the last expansion.
Written By Semnodime.
"""
kernel = np.ones(shape=(radius * 2 + 1,) * 2, dtype=np.uint8)
while True:
expanded = cv2.dilate(src=marker, kernel=kernel)
cv2.bitwise_and(src1=expanded, src2=mask, dst=expanded)
# Termination criterion: Expansion didn't change the image at all
if (marker == expanded).all():
return expanded
marker = expanded
This answer arrives late, but here is the basic algorithm for under-reconstruction:
Inputs are two images: ImReference and ImMarker, with marker <= reference
Intermediate image: ImRec
Output image: ImResult
Copy ImMarker into ImRec
copy ImRec into ImResult
ImDilated = Dilation(ImResult)
ImRec = Minimum(ImDilated, ImReference)
If ImRec != ImResult then return to step 5.
It's not the most optimal algorithm, but it uses only basic operations.