Let's say I have 2 images containing the Word "CAT" but with different sizes and i want to apply DTW to check that both images have the same words.
i used the code implemented here
http://bytefish.de/blog/dynamic_time_warping/
but the problem is that i only have 2 vectors and when these vectors have elements more than 800 the program crashes
so my question is:
Is it efficient to take the whole picture and put it into a vector and apply DTW for these 2 vectors ?
or shall I divide the pictures into slices(windows) and compare these slices with each others ? and in this case what if an image has 15 slice and the other has only 10 slices , how can i compare between them ?
finally if there's any source that explain implementing DTW for image matching please pass it to me.
Related
I've started to learn OpenCV two weeks ago, so don't be too harsh.
I have two high-resolution images with a nearby 30% overlap. The first photo is rotated relative to the second and transformed (photos from different angles). I need to make one combined image.
I decided to use KeyPoints to find the same areas in different images.
I used the BRISK detector. After detecting I used KeyPointsFilter::retainBest() function to left only 1000 the best areas. After that, I computed descriptors and using DescriptorMatcher::create(DescriptorMatcher::BRUTEFORCE) function matched all the descriptors. But I've got too many different matches.
So Further I sorted all the matches and selected matches with prevailing shifting. Thus I've got about 100 KeyPoints that really match each other on different images (matching result got with drawMatches() function is shown below)
Matching result
But images are not just shifted. They are rotated and transformed, so I can't just use Mat::copyTo() function for merging two images into a new larger one, because there is no common shifting distance for all the keyPoints.
Is there any function in OpenCV that can connect two images with preliminary angle transform?
I am currently dealing with text recognition. Here is a part of binarized image with edge detection (using Canny):
EDIT: I am posting a link to an image. I don't have 10 rep points so I cannot post an image.
EDIT 2: And here's the same piece after thresholding. Honestly, I don't know which approach would be better.
[2
The questions remain the same:
How should I detect certain letters? I need to determine location of every letter and then every word.
Is it a problem that some letters are "opened"? I mean that they are not closed areas.
If I use cv::matchtemplate, does it mean that I need to have 24 templates for every letter + 10 for every digit? And then loop over my image to determine the best correlation?
If both the letters and squares they are in, are 1-pixel wide, what filters / operations should I do to close the opened letters? I tried various combinations of dilate and erode - with no effect.
The question is kind of "how do I do OCR with Open CV?" and the answer is that it's an involved process and quite difficult.
But some pointers. Firstly, its hard to detect letters which are outlined. Most of the tools are designed for filled letters. But that image looks as if there will only be one non-letter distractor if you fill all loops using a certain size threshold. You can get rid of the non-letter lines because they are a huge connected object.
Once you've filled the letters, they can be skeletonised.
You can't use morphological operations like open and close very sensibly on images where the details are one pixel wide. You can put the image through the operation, but essentially there is no distinction between detail and noise if all features are one pixel. However once you fill the letters, that problem goes away.
This isn't in any way telling you how to do it, just giving some pointers.
As mentioned in the previous answer by malcolm OCR will work better on filled letters so you can do the following
1 use your second approach but take the inverse result and not the one you are showing.
2 run connected component labeling
3 for each component you can run the OCR algorithm
In order to discard outliers I will try to use the spatial relation between detected letters. They sold have other letter horizontally or vertically next to them.
Good luck
I am trying to implement the pyramid match kernel , and now I am stuck in a point .
I understand i need to partition the feature space into increasing larger bins , so that at higher levels multiple points[feature vectors] will map to a single bin. What I cant seem to figure out is what is how to partition a feature space. I understood the case where the feature vectors are 1 or 2 dimensional , but how to partition a d dimensional feature space.
I understand the question is vague , But I just dont know where else to ask.
i may be wrong here, but I guess the intuition is to quantize the feature space. So, you could basically do bag of words with different code book sizes (128,64,32...) and use their kernel to compute similarity between 2 images.
I have an image, holding results of segmentation, like this one.
I need to build a graph of neighborhood of patches, colored in different colors.
As a result I'd like a structure, representing the following
Here numbers represent separate patches, and lines represent patches' neighborhood.
Currently I cannot figure out where to start, which keywords to google.
Could anyone suggest anything useful?
Image is stored in OpenCV's cv::Mat class, as for graph, I plan to use Boost.Graph library.
So, please, give me some links to code samples and algorithms, or keywords.
Thanks.
Update.
After a coffee-break and some discussions, the following has come to my mind.
Build a large lattice graph, where each node corresponds to each image pixel, and links connect 8 or 4 neighbors.
Label each graph node with a corresponding pixel value.
Try to merge somehow nodes with the same label.
My another problem is that I'm not familiar with the BGL (but the book is on the way :)).
So, what do you think about this solution?
Update2
Probably, this link can help.
However, the solution is still not found.
You could solve it like that:
Define regions (your numbers in the graph)
make a 2D array which stores the region number
start at (0/0) and set it to 1 (region number)
set the whole region as 1 using floodfill algorithm or something.
during floodfill you probably encounter coordinates which have different color. store those inside a queue. start filling from those coordinates and increment region number if your previous fill is done.
.
Make links between regions
iterate through your 2D array.
if you have neighbouring numbers, store the number pair (probably in a sorted manner, you also have to check whether the pair already exists or not). You only have to check the element below, right and the one diagonal to the right, if you advance from left to right.
Though I have to admit I don't know a thing about this topic.. just my simple idea..
You could use BFS to mark regions.
To expose cv::Mat to BGL you should write a lot of code. I think writeing your own bfs is much more simplier.
Than you for every two negbours write their marks to std::set<std::pair<mark_t, mark_t>>.
And than build graph from that.
I think that if your color patches are that random, you will probably need a brute force algorithm to do what you want. An idea could be:
Do a first brute force pass. This has to identify all the patches. For example, make a matrix A of the same size as the image, and initialize it to 0. For each pixel which is still zero, start from it and mark it as a new patch, and try a brute force approach to find the whole extent of the patch. Each matrix cell will then have a value equal to the number of the patch it is in it.
The patch numbers have to be 2^N, for example 1, 2, 4, 8, ...
Make another matrix B of the size of the image, but each cell holds two values. This will represent the connection between pixels. For each cell of matrix B, the first value will be the absolute difference between the patch number in the pixel and the patch number of an adjacent pixel. First value is difference with the pixel below, second with the pixel to the left.
Pick all unique values in matrix B, you have all the connections possible.
This works because each difference between patches number is unique. For example, if in B you end up with numbers 3, 6, 7 it will mean that there are contacts between patches (4,1), (8,2) and (8,1). Value 0 of course means that there are two pixels in the same patch next to each other, so you just ignore them.
I have two images( of same size ) and I want to check whether the second is the same as the first one but with some shift.So more formally I have two matrices A,B of same size and I want to check whether a submatrix of B occurs in A. But as this pictures are big(400x400) I need an efficient way to do so. Acceptable complexity would be O(n^3).Is there a way that can be done or should I make the images smaller ?:)
Thanks in advance.
You could simply use run-of-the-mill 2D cross correlation and detect where the maximum value lies to determine the (x,y) offset. Following the Cross-correlation theorem you can efficiently implement this in the Fourier domain.
See this simple example in Matlab on github, cross correlation and peak finding
EDIT
Here follows the short and mostly incomplete guide to the rigid registration of images. The gist of the cross correlation idea is as follows:
Say I have a 1D vector:
t = [1 2 3 1 2 3 4 ]
I shift this vector -4 places resulting in a new vector t2:
t2 = [2 3 4 1 2 3 1]
Now I have a look at the so called cross correlation c between t and t2:
c = [1 5 11 15 17 25 38 37 28 24 29 18 8]
Now, this cross correlation vector has a maximum of 38 located on position or index 7. This we can use to ascertain the shift as follows:
offset = round((7-(length(c)+1))/2)
offset = -4
where length() gives the dimensionality or number of elements in this dimension, of the cross correlation result.
Now, as should be evident, the cross correlation in the Spacial domain requires a lot of operations. This is where the above mentioned cross-correlation theorem comes in to play which links correlation in the Spacial domain to multiplication in the Fourier domain. The fourier transform is blessed with a number of very fast implementations (FFT) requiring vastly less operations, hence the reason they are used for determining the cross correlation.
There are many methods that deal with so-called rigid registration from stitching of satellite and holiday images a like to overlaying images from different sources as often found in medical imaging applications.
In your particular case, you might want to have a look at Phase correlation. Also the Numercial recipes in c book contains a chapter on fourier and correlation.
This problem is known in the literature as "Two dimentional pattern matching" (hint: google it).
Here is a paper that describes both optimal and naive algorithms:
Fast two dimensional pattern matching
Another popular term is "sub-matrix matching", but this is usually used when you want certain level of fuzziness instead of exact matches. Here is an example for such algorithm:
Partial shape recognition by sub-matrix matching for partial matching guided image labeling