This blog post shows us how to normalize image pixel values using PyTorch's dataloader. Using a DataLoader to do the calculation is important, as it allows the standard deviation to be calculated across batches (not the overall stddev).
For performance reasons, I need to port this code to a C++ version, and I have OpenCV in mind. Does OpenCV have something similar to PyTorch's dataloader that makes batch calculations easier?
This particular snippet caught my eye:
# loop through images
for inputs in tqdm(image_loader):
psum += inputs.sum(axis = [0, 2, 3])
psum_sq += (inputs ** 2).sum(axis = [0, 2, 3])
The author stated that setting the axis=[0, 2, 3] allows the sum/sum_sq to be calculated with respect to axis=1. The dimensions of inputs is [batch_size x 3 x image_size x image_size], so we need to make sure we aggregate values per each RGB channel separately. Can a similar calculation be done on a cv::Mat object?
I have an image here with a table.. In the column on the right the background is filled with noise
How to detect the areas with noise? I only want to apply some kind of filter on the parts with noise because I need to do OCR on it and any kind of filter will reduce the overall recognition
And what kind of filter is the best to remove the background noise in the image?
As said I need to do OCR on the image
I tried some filters/operations in OpenCV and it seems to work pretty well.
Step 1: Dilate the image -
kernel = np.ones((5, 5), np.uint8)
cv2.dilate(img, kernel, iterations = 1)
As you see, the noise is gone but the characters are very light, so I eroded the image.
Step 2: Erode the image -
kernel = np.ones((5, 5), np.uint8)
cv2.erode(img, kernel, iterations = 1)
As you can see, the noise is gone however some characters on the other columns are broken. I would recommend running these operations on the noisy column only. You might want to use HoughLines to find the last column. Then you can extract that column only, run dilation + erosion and replace this with the corresponding column in the original image.
Additionally, dilation + erosion is actually an operation called closing. This you could call directly using -
cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
As #Ermlg suggested, medianBlur with a kernel of 3 also works wonderfully.
cv2.medianBlur(img, 3)
Alternative Step
As you can see all these filters work but it is better if you implement these filters only in the part where the noise is. To do that, use the following:
edges = cv2.Canny(img, 50, 150, apertureSize = 3) // img is gray here
lines = cv2.HoughLinesP(edges, 1, np.pi / 180, 100, 1000, 50) // last two arguments are minimum line length and max gap between two lines respectively.
for line in lines:
for x1, y1, x2, y2 in line:
print x1, y1
// This gives the start coordinates for all the lines. You should take the x value which is between (0.75 * w, w) where w is the width of the entire image. This will give you essentially **(x1, y1) = (1896, 766)**
Then, you can extract this part only like :
extract = img[y1:h, x1:w] // w, h are width and height of the image
Then, implement the filter (median or closing) in this image. After removing the noise, you need to put this filtered image in place of the blurred part in the original image.
image[y1:h, x1:w] = median
This is straightforward in C++ :
extract.copyTo(img, new Rect(x1, y1, w - x1, h - y1))
Final Result with alternate method
Hope it helps!
My solution is based on thresholding to get the resulted image in 4 steps.
Read image by OpenCV 3.2.0.
Apply GaussianBlur() to smooth image especially the region in gray color.
Mask the image to change text to white and the rest to black.
Invert the masked image to black text in white.
The code is in Python 2.7. It can be changed to C++ easily.
import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
# read Danish doc image
img = cv2.imread('./imagesStackoverflow/danish_invoice.png')
# apply GaussianBlur to smooth image
blur = cv2.GaussianBlur(img,(5,3), 1)
# threshhold gray region to white (255,255, 255) and sets the rest to black(0,0,0)
mask=cv2.inRange(blur,(0,0,0),(150,150,150))
# invert the image to have text black-in-white
res = 255 - mask
plt.figure(1)
plt.subplot(121), plt.imshow(img[:,:,::-1]), plt.title('original')
plt.subplot(122), plt.imshow(blur, cmap='gray'), plt.title('blurred')
plt.figure(2)
plt.subplot(121), plt.imshow(mask, cmap='gray'), plt.title('masked')
plt.subplot(122), plt.imshow(res, cmap='gray'), plt.title('result')
plt.show()
The following is the plotted images by the code for reference.
Here is the result image at 2197 x 3218 pixels.
As I know the median filter is the best solution to reduce noise. I would recommend to use median filter with 3x3 window. See function cv::medianBlur().
But be careful when use any noise filtration simultaneously with OCR. Its can lead to decreasing of recognition accuracy.
Also I would recommend to try using pair of functions (cv::erode() and cv::dilate()). But I'm not shure that it will best solution then cv::medianBlur() with window 3x3.
I would go with median blur (probably 5*5 kernel).
if you are planning to apply OCR the image. I would advise you to the following:
Filter the image using Median Filter.
Find contours in the filtered image, you will get only text contours (Call them F).
Find contours in the original image (Call them O).
isolate all contours in O that have intersection with any contour in F.
Faster solution:
Find contours in the original image.
Filter them based on size.
Blur (3x3 box)
Threshold at 127
Result:
If you are very worried of removing pixels that could hurt your OCR detection. Without adding artefacts ea be as pure to the original as possible. Then you should create a blob filter. And delete any blobs that are smaller then n pixels or so.
Not going to write code, but i know this works great as i use this myself, though i dont use openCV (i wrote my own multithreaded blobfilter out of speed reasons). And sorry but i cannot share my code here. Just describing how to do it.
If processing time is not an issue, a very effective method in this case would be to compute all black connected components, and remove those smaller than a few pixels. It would remove all the noisy dots (apart those touching a valid component), but preserve all characters and the document structure (lines and so on).
The function to use would be connectedComponentWithStats (before you probably need to produce the negative image, the threshold function with THRESH_BINARY_INV would work in this case), drawing white rectangles where small connected components where found.
In fact, this method could be used to find characters, defined as connected components of a given minimum and maximum size, and with aspect ratio in a given range.
I had already faced the same issue and got the best solution.
Convert source image to grayscale image and apply fastNlMeanDenoising function and then apply threshold.
Like this -
fastNlMeansDenoising(gray,dst,3.0,21,7);
threshold(dst,finaldst,150,255,THRESH_BINARY);
ALSO use can adjust threshold accorsing to your background noise image.
eg- threshold(dst,finaldst,200,255,THRESH_BINARY);
NOTE - If your column lines got removed...You can take a mask of column lines from source image and can apply to the denoised resulted image using BITWISE operations like AND,OR,XOR.
Try thresholding the image like this. Make sure your src is in grayscale. This method will only retain the pixels which are between 150 and 255 intensity.
threshold(src, output, 150, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);
You might want to invert the image as you are trying to negate the gray pixels. After the operation, invert it again to get your desired result.
hey guys i am using opencv 2.4 with python 2.7 on ubuntu14.04
I want to select multiple Region of Interest in an image is it possible to do so.
I want to do motion detection in only the area i have selected to do so any of the following theory can solve my problem but don't know how to implement any of them : -
Mask the area in image which is not ROI
After creating multiple ROI image how to add them such that all those ROI can be on the original location and remaining area be masked
Yes it is possible to do so. Main Idea behind the solution would be creating a mask and setting it to 0 wherever you do not want the motion tracker to track.
If you are using numpythen you can create the mask and set the regions you do not want the detector to use, to zero. (Similar to cv::Rect(start.col, start.row, numberof.cols, numberof.rows) = 0 in c++)
In python using numpy you can create a mask, somewhat like this:
import numpy as np
ret, frame = cap.read()
if frame.ndim == 3
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
elif frame.ndim == 4
gray = cv2.cvtColor(frame, cv2.COLOR_BGRA2GRAY)
else:
gray = frame
# create mask
mask = np.ones_like(gray)
mask[start_row:end_row, start_col:end_col] = 0
mask[another_starting_row:another_ending_row, another_start_col:another_end_col] = 0
# and so on you can create your own mask
# use for loops to create specific masks
It is a bit crude solution but will do the job. check numpy documentation (PDF) for more info.
I am trying to change the RGB for the overall image for a project. Currently I am working with a test file before I apply it to the actual Image. I want to test different values of RGB but would first like to start with the mean of all three. How would I go about doing this? I have other modules installed such as scipy, numpy, matplotlib, etc if those are needed. Thanks
from PIL import Image, ImageFilter
test = Image.open('/Users/MeganRCunninghan/Pictures/4th-of-July-Wallpaper.ppm')
test.show()
test.getrgb()
Assuming your image is stored as a numpy.ndarray (Test this with print type(test))...
Your image will be represented by an NxMx3 array. Basically this means you have a N by M image with a color depth of 3- your RGB values. Taking the mean of those 3 will leave you with an NxMx1 array, where the 1 is now the average intensity. Numpy does this very well:
test = test.mean(2)
The parameter given, 2, specifies the dimension to take the mean along. It could be either 0, 1, or 2, because your image matrix is 3 dimensional. This should return an NxM array. You basically will be left with a gray-scale, (color depth of 1) image. Try to show the value that gets returned! If you get Nx3 or Mx3, you know you have just taken the average along the wrong axis. Note that you can check the dimensions of a numpy array with:
test.shape
Shape will be a tuple describing the dimensions of your image.
I'm working on a mini project that needs to manipulate rgb colors of pixels from the entire screen.
The solution I have come up so far is to take screenshots continuously and find the rgb colors with a bit of processing. It's fast enough but it takes a lot of cpu usage because of the screenshot part.
So, I was wondering if there is a way I can extract the data I need (in matrix form) from computer memory or gpu memory, using a C++ libary, opengl* or another approach.
I'm using windows 7 64bit, but I'd like it if the solution would be cross platform (windows/linux/mac). Also it would be nice if the solution could be implemented with C/C++/Java/Python (any of these will do).
*I did some research the past week and I don't think it's possible with opengl. However, I'm not so sure...
EDIT Right now I have working code in Java and Python that takes screenshots of my entire display and averages the rgb colors of pixels. In java I use processing and Robot library and in python I use PIL.
In java I grab the image with createScreenCapture() and read the values with getRGB().
In python I grab the image with ImageGrab.grab(), load the object and read the values in tuples.
The python code as an example is this:
from PIL import Image
from PIL import ImageGrab
import operator
#######################################
#scree resolution
width = 1920
height = 1080
step = 10
#######################################
def process(pix):
temp = (0, 0, 0)
count = 0
for i in range(0, width, step):
for j in range(0, height, step):
temp = tuple(map(operator.add, temp, pix[i,j]))
count += 1
return (int(temp[0]/count), int(temp[1]/count), int(temp[2]/count))
while True:
img=ImageGrab.grab()
pix = img.load()
res = process(pix)
print(res)
However this code works only in Windows (because of imageGrab)...