The C++ function is:
Moments moments(InputArray array, bool binaryImage=false )
The first one I understand what it is, for the second one it says:
binaryImage – If it is true, all non-zero image pixels are treated as
1’s. The parameter is used for images only.
What does this mean exactly? I say it's true for binary images only and false for non-binary images? In my application I use a binary image to calculate simple moments.
It means that if this value is true the image you insert will be treated as it is a binary image meaning that even if you have values that are bigger than 1 they will be treated as 1.
I(x,y) > 0 = 1
I(x,y) == 0 = 0
If the value you insert is false than the values in the moments will take the real value into the moments calculation.
For example lets say you have the following 3X3 image:
100 0 10
10 0 1
0 0 0
m00 which is the area of the image will be:
If the flag is true than we have 4 pixels that are non zero and the value will be 4.
If the flag is false we will have 100+10+10+1 = 121
When should you use what?
Lets say we have a blob in our image.
If we treat the image as binary than the moments will give us spatial information about the blob.
For example m01/m00 and m10/m00 will give ust the center of mass of the object.
But if we treat the image not as binary we than moments will give us texture/color information.
For example m00/(number of pixels in the blob) = the mean color of the blob.
Related
I have the intensity values from a row of a gray scale image (0-255) as in the image. I want to apply pixel centering so I subtract the mean value from all the intensity values. But I don't want to include values higher than 200. What is the best way to do it without iterating through the image? I tried the cv2.mean(input, mask) but I couldn't set the mask properly. I tried also the mean(x for x in mid_line if x < 200) from the statistics library but the resulted mean is incorrect.
You can create a mask using cv2.threshold. You need to set all values above 200 to 0 and the ones below it to 255, hence, the flag needs to be set to cv2.THRESH_BINARY_INV.
import cv2
# assuming row has your values
ret, thresh = cv2.threshold(row, 200, 255, cv2.THRESH_BINARY_INV)
result = cv2.mean(row, thresh)
This will only calculate the mean of pixels less equal to 200.
Hey there I am using Opencv3.3 and Pyhton2.7 for recognizing a Maze in an image.
I have to find the outermost limit of the Maze in the image.
I tried closing the entrance and exit gaps of the maze and finding the outermost shape. I worked on this for closing the gaps but it is useless for my problem because I need these gaps to solve the maze.
This is the original image
I want to find outermost limit of the maze.
This is what I want
How can I extract outermost contour?
I would do this with numpy rather than OpenCV, but the two are compatible so you can mix and match anyway, or you can adapt the technique to OpenCV once you get the idea of how I am tackling it.
The strategy is to sum all the pixels across every row and make a single pixel wide image (shown on the right below) that is the sum of all the pixels in each row. I then find the biggest value in that column and divide by that to normalise everything to the range 0..100. Now any pixel that is less than 30 in that single pixel wide image means that the corresponding row had less than 30% of white pixels in the original image - i.e. it was largely black.
Then I make the same summation of all the columns to produce the column sums - shown across the bottom of the image below:
I think some folks refer to this technique as a "projection" if you want to Google it.
So, the code looks like this:
#!/usr/local/bin/python3
import numpy as np
from PIL import Image
# Load image - you can use OpenCV "imread()" just the same and convert to grayscale
im = np.array(Image.open('maze.jpg').convert('L'))
# Get height and width
h,w = im.shape[0:2]
# Make a single pixel wide column, same height as image to store row sums in
rowsums=np.empty((h))
# Sum all pixels in each row
np.sum(im,axis=1,out=rowsums)
# Normalize to range 0..100, if rowsum[i] < 30 that means fewer than 30% of the pixels in row i are white
rowsums /= np.max(rowsums)/100
# Find first and last row that is largely black
first = last = -1
for r in range(h):
if first < 0 and rowsums[r] < 30:
first = r
if rowsums[r] < 30:
last = r
print(first,last)
# Make a single pixel tall row, same width as image to store col sums in
colsums=np.empty((w))
# Sum all pixels in each col
np.sum(im,axis=0,out=colsums)
# Normalize to range 0..100, if colsum[i] < 30 that means fewer than 30% of the pixels in col i are white
colsums /= np.max(colsums)/100
# Find first and last col that is largely black
first = last = -1
for c in range(w):
if first < 0 and colsums[c] < 30:
first = c
if colsums[c] < 30:
last = c
print(first,last)
That outputs:
62 890
36 1509
So the top row of the maze is row 62, and the bottom one is row 890. The left column of the maze is column 36 and the rightmost column is col 1509.
If I draw on an 80% transparent red rectangle to match those locations, I get:
`input = tf.Variable(tf.random_normal([10,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')`
In above code output size will be (10x3x3x7).Does it means that there will be total 70 images of size(3x3) because filter has 7 channels so 10x7 is the total number of images?
Batch Size : 10 and Input Image Channels : 5
Output Channels of Conv2D : 7
Each image in a batch has 7 activations (output channels of the layer).
And Yes, there will be 10x7 : 70 3x3 Activation Maps at the end of this layer.
Are you looking for more information ??
Update:
Input depth dimension of filter should be same as number of input channels.
Input and Strides are never going to be the same. Stride [1,2,2,1] implies : conv filter moves horizontally with 2 pixel stride and vertically 2 pixel stride. First 1 is for batch and last 1 is for channel : means it should process all images in the batch and all channels.
To understand better, look at the animation for convolution demo given in here:
http://cs231n.github.io/convolutional-networks/#conv
Update 2
Each 3x3 filter among 7 filters is applied through all input channels to get one value for each output channel. (shown in the figure)
k : input channel index
j : output channel index
You can see, each output channel " hj " is combination sum of all input channels multiplied with specific filter " wj "
Used from https://sites.google.com/site/lsvrtutorialcvpr14/home/deeplearning
I hope this answers your question.
Problem: I have a large number of scanned documents that are linked to the wrong records in a database. Each image has the correct ID on it somewhere that says where it belongs in the db.
I.E. A DB row could be:
| user_id | img_id | img_loc |
| 1 | 1 | /img.jpg|
img.jpg would have the user_id (1) on the image somewhere.
Method/Solution: Loop through the database. Pull the image text in to a variable with OCR and check if user_id is found anywhere in the variable. If not, flag the record/image in a log, if so do nothing and move on.
My example is simple, in the real world I have a guarantee that user_id wouldn't accidentally show up on the wrong form (it is of a specific format that has its own significance)
Right now it is working. However, it is incredibly strict. If you've worked with OCR you understand how fickle it can be. Sometimes a 7 = 1 or a 9 = 7, etc. The result is a large number of false positives. Especially among images with low quality scans.
I've addressed some of the image quality issues with some processing on my side - increase image size, adjust the black/white threshold and had satisfying results. I'd like to add the ability for the prog to recognize, for example, that "81*7*23103" is not very far from "81*9*23103"
The only way I know how to do that is to check for strings >= to the length of what I'm looking for. Calculate the distance between each character, calc an average and give it a limit on what is a good average.
Some examples:
Ex 1
81723103 - Looking for this
81923103 - Found this
--------
00200000 - distances between characters
0 + 0 + 2 + 0 + 0 + 0 + 0 + 0 = 2
2/8 = .25 (pretty good match. 0 = perfect)
Ex 2
81723103 - Looking
81158988 - Found
--------
00635885 - distances
0 + 0 + 6 + 3 + 5 + 8 + 8 + 5 = 35
35/8 = 4.375 (Not a very good match. 9 = worst)
This way I can tell it "Flag the bottom 30% only" and dump anything with an average distance > 6.
I figure I'm reinventing the wheel and wanted to share this for feedback. I see a huge increase in run time and a performance hit doing all these string operations over what I'm currently doing.
Is it possible that an image of a given size of 26 * 7 can contain 78,77 or 79 values in each row of the Mat?Why is this so? I have 95 images all of which are 26 by 7,I discovered that some images have 77 separate color values on each row,others have 79.This is problematic,since I need a standard value.
This is how the images are cropped to a size of 26 by 7.
Mat standard_size=largestObject1(Rect(0,0,26,7));//a template to get the standard _size of a binary image
cv::resize(thresholdi,thresholdi,standard_size.size());
If I could convert it to only 26 pixel values,would I lose information?
I create the following code:-
ofstream in("Eye_Gestures.txt");
//Eye Graze Class
IplImage *img2 = cvLoadImage("eye2.bmp");
Mat imgg2=cvarrToMat(img2);
Formatted line0i2 = format(imgg2.row(0),"CSV" );
Formatted line1i2 = format(imgg2.row(1),"CSV" );
Formatted line2i2 = format(imgg2.row(2),"CSV" );
Formatted line3i2 = format(imgg2.row(3),"CSV" );
Formatted line4i2 = format(imgg2.row(4),"CSV" );
Formatted line5i2 = format(imgg2.row(5),"CSV" );
Formatted line6i2 = format(imgg2.row(6),"CSV" );
in<<line0i2<<", "<<line1i2<<", "<<line2i2<<", "<<line3i2<<", "<<line4i2<<", "<<line5i2<<", "<<line6i2<<", "<<EyeGraze<<endl;
I need to ensure that each row stores the same number of separate colour values for all 95 images.If it going to be 77 values ,it needs to be 77 for all rows for each image.How can I ensure that I pass 77 not 78 or 79 values to the text file?How can I disregard the excess values for each row?how can I keep track of separate colour values on a row without having to manually count them?
An image of 26 by 7 pixels contains 7 rows of 26 pixels each. There won't be 27, let alone 77 pixels on one row. You're entirely confused.
Go back an rethink what you're actually trying to achieve. Note that text files are not a natural format for image files.
A bitmap (bmp) image contains exactly 1 entry for each pixel. Depending on the type of bitmap you are using, these may be of different (but always constant) resolution.
The method you are using for converting the bmp to csv text representation seems to serialize each pixel as a set of 3 values (possibly rgb). If in an image with a width of 26px you are getting a different number of entries per line than 78, you most likely made some mistake when reading the file.
Common mistakes would be 0 values expressed as empty cells ",," or - depending on the locale you are using - confusion between the decimal comma character and the csv separator.
Are you sure the separator is indeed the ',' character you are using for separating the lines when writing to the output stream which is confusingly named in?