Remove a certain pattern from an image in OPENCV - c++

I am trying to write a software for document management. First I Input the blank invoice. then feeds the other invoices with data. Using SIFT detectors i get what type of a invoice it is.
Then I want to remove the interect of the two images. Basically this will keep only the information and remove the common data on the invoice. I want to know is there a proper way to remove areas from the image

there is a concept in imagery called the region of interest. It creates a pointer to a sub-region in the original image, this could help you to read directly at x,y coordinates in the image.
Another possibility would be to make a substraction of the original image. But depending on the quality of the filled form picture, this might lead to other problems.
I was implying the ROI in a sense that you could create a ROI for every place where the form has input data and process only those specific regions
I found a function you might help you, cvAbsDiff, which can subtract an image from another
Here is a link that might help you understanding how to use it
http://blog.damiles.com/?p=67

Related

2D object detection with only a single training image

The vision system is given a single training image (e.g. a piece of 2D artwork ) and it is asked whether the piece of artwork is present in the newly captured photos. The newly captured photos can contain a lot of any other object and when the artwork is presented, it must face up but may be occluded.
The pose space is x,y,rotation and scale. The artwork may be highly symmetric or not.
What is the latest state of the art handling this kind of problem?
I have tried/considered the following options but there are some problems in all of them. If my argument is invalid please correct me.
deep learning(rcnn/yolo): a lot of labeled data are needed means a lot of human labor is needed for each new pieces of artwork.
traditional machine learning(SVM,Random forest): same as above
sift/surf/orb + ransac or voting: when the artwork is symmetric, the features matched are mostly incorrect. A lot of time is needed in the ransac/voting stage.
generalized hough transform: the state space is too large for the voting table. Pyramid can be applied but it is difficult to choose some universal thresholds for different kinds of artwork to proceed down the pyramid.
chamfer matching: the state space is too large. Too much time is needed in searching across the state space.
Object detection requires a lot of labeled data of the same class to generalize well, and in your setting it would be impossible to train a network with only single instance.
I assume that in your case online object trackers can work, at least give it a try. There are some convolutional object trackers that work great like Siamese CNNs. The code is open source at github, and you can watch this video to see its performance.
Online object tracking: Given the initialized state (e.g., position
and size) of a target object in a frame of a video, the goal
of tracking is to estimate the states of the target in the subsequent
frames.-source-
You can try using traditional feature based image processing algorithm which might give true positive template matches up to a descent accuracy.
Given template image as in the question:
First dilate the image to join all very closely spaced connected
components.
Find the convex hull of the connected object obtained above,This will give you a polygon.
Use above polygon edge length information like (max-length/min-length) ratio as feature of the template.
Also find the pixel density in the polygon as second feature.
We have 2 features now.
Scene image feature vector:
Similarly Again in the scene image use dilation followed by connected components identification, define convex hull(polygon) around each connected objects and define feature vector for each object(edge info, pixel density).
Now as usual search for template feature vector in the scene image feature vectors data with minimum feature distance(also use certain upper level distance threshold value to avoid false positive matches).
This should give the true positive matches if available in the scene image.
Exception: This method would not work for occluded objects.

How to use hog detector to Know if a preselected ROI is a person?

I want to use hog detector in order to know if a ROI is a person or not, I already coded a filtering technique allowing me to select ROIs that can contain a person.
for instance this is what I'm doing:
1.saving my rois as rectangles
2.for each roi:
-cropping the original image, rescaling it and then applying hog.detect() on this image instead of the original one.
this is working but it doesn't seem to be the right way to do what I want.
I found out lately that there is the hog.detectROI() function, and I'm wondering if I can use it and give it my image and my ROIs as input only once, will it be able to tell me wich ROIs are people?
I would be grateful if someone can give me an example to know how to use this function, especially what should I give as locations when my ROIs are rectangles?

Identifying blobs in image as that of a vehicle

Any idea how I can get the smaller blobs belonging to the same vehicle count as 1 vehicle? Due to background subtraction, in the foreground mask, some of the blobs belonging to a vehicle are quite small, and hence filtering the blobs based on their size won't work.
Try filtering things based on colorDistance() and the comparing the mean color of the blobs in the image with the vehicle against a control image of the background without the car in it. The SimpleCV docs have a tutorial specifically on this topic. That said... it may not always work as expected. Another possibility (just occurred to me) might be summing up the area of the blobs of interest and seeing if that sum is over a given thresh-hold, rather than just any one blob itself.

Difference between pixel based and frame based methods

I am working on video frames using OpenCV. My question might be low leveled, but I want to clarify it first.
There are plenty of pixel based methods available in openCV, but can I change them into frame based ones?
To me, it is similar, since the whole frame is also stored in one matrix, and I will read that matrix from the beginning to end to handle it. So for instance for finding average value, the only thing I should change is find the total average of whole pixels for one frame.
But for one pixel, see several frames and decide that pixel's average based on them. But when it comes to build models like GMM, I cannot differentiate it.
Could someone help explain it clearly?
Can I use or change openCV's GMM for global usage?
I think this is a good definition for the problem, though you are working with pixels.
Pixel-based methods: The information of the pixel(x,y) in the resulting processed image is the result of applying transformations to the pixel(x,y) of the original image.
Region-based methods: The pixels in the original image are grouped forming a contiguous regions and transformations are applied to the whole region. Example: the resulting pixel(x,y) is the mean of a patch around the original pixel (x,y).

How to detect Text Area from image?

i want to detect text area from image as a preprocessing step for tesseract OCR engine, the engine works well when the input is text only but when the input image contains Nontext content it falls, so i want to detect only text content in image,any idea of how to do that will be helpful,thanks.
Take a look at this bounding box technique demonstrated with OpenCV code:
Input:
Eroded:
Result:
Well, I'm not well-experienced in image processing, but I hope I could help you with my theoretical approach.
In most cases, text is forming parallel, horisontal rows, where the space between rows will contail lots of background pixels. This could be utilized to solve this problem.
So... if you compose every pixel columns in the image, you'll get a 1 pixel wide image as output. When the input image contains text, the output will be very likely to a periodic pattern, where dark areas are followed by brighter areas repeatedly. These "groups" of darker pixels will indicate the position of the text content, while the brighter "groups" will indicate the gaps between the individual rows.
You'll probably find that the brighter areas will be much smaller that the others. Text is much more generic than any other picture element, so it should be easy to separate.
You have to implement a procedure to detect these periodic recurrences. Once the script can determine that the input picture has these characteristics, there's a high chance that it contains text. (However, this approach can't distinguish between actual text and simple horisontal stripes...)
For the next step, you must find a way to determine the borderies of the paragraphs, using the above mentioned method. I'm thinking about a pretty dummy algorithm, witch would divide the input image into smaller, narrow stripes (50-100 px), and it'd check these areas separately. Then, it would compare these results to build a map of the possible areas filled with text. This method wouldn't be so accurate, but it probably doesn't bother the OCR system.
And finally, you need to use the text-map to run the OCR on the desired locations only.
On the other side, this method would fail if the input text is rotated more than ~3-5 degrees. There's another backdraw, beacuse if you have only a few rows, then your pattern-search will be very unreliable. More rows, more accuracy...
Regards, G.
I am new to stackoverflow.com, but I wrote an answer to a question similar to this one which may be useful to any readers who share this question. Whether or not the question is actually a duplicate, since this one was first, I'll leave up to others. If I should copy and paste that answer here, let me know. I also found this question first on google rather than the one i answered so this may benefit more people with a link. Especially since it provides different ways of going about getting text areas. For me, when I looked up this question, it did not fit my problem case.
Detect text area in an image using python and opencv
In the Current time, the best way to detect the text is by using EAST (An Efficient and Accurate Scene Text Detector)
The EAST pipeline is capable of predicting words and lines of text at arbitrary orientations on 720p images, and furthermore, can run at 13 FPS, according to the authors.
EAST quick start tutorial can be found here
EAST paper can be found here