I have two images( of same size ) and I want to check whether the second is the same as the first one but with some shift.So more formally I have two matrices A,B of same size and I want to check whether a submatrix of B occurs in A. But as this pictures are big(400x400) I need an efficient way to do so. Acceptable complexity would be O(n^3).Is there a way that can be done or should I make the images smaller ?:)
Thanks in advance.
You could simply use run-of-the-mill 2D cross correlation and detect where the maximum value lies to determine the (x,y) offset. Following the Cross-correlation theorem you can efficiently implement this in the Fourier domain.
See this simple example in Matlab on github, cross correlation and peak finding
EDIT
Here follows the short and mostly incomplete guide to the rigid registration of images. The gist of the cross correlation idea is as follows:
Say I have a 1D vector:
t = [1 2 3 1 2 3 4 ]
I shift this vector -4 places resulting in a new vector t2:
t2 = [2 3 4 1 2 3 1]
Now I have a look at the so called cross correlation c between t and t2:
c = [1 5 11 15 17 25 38 37 28 24 29 18 8]
Now, this cross correlation vector has a maximum of 38 located on position or index 7. This we can use to ascertain the shift as follows:
offset = round((7-(length(c)+1))/2)
offset = -4
where length() gives the dimensionality or number of elements in this dimension, of the cross correlation result.
Now, as should be evident, the cross correlation in the Spacial domain requires a lot of operations. This is where the above mentioned cross-correlation theorem comes in to play which links correlation in the Spacial domain to multiplication in the Fourier domain. The fourier transform is blessed with a number of very fast implementations (FFT) requiring vastly less operations, hence the reason they are used for determining the cross correlation.
There are many methods that deal with so-called rigid registration from stitching of satellite and holiday images a like to overlaying images from different sources as often found in medical imaging applications.
In your particular case, you might want to have a look at Phase correlation. Also the Numercial recipes in c book contains a chapter on fourier and correlation.
This problem is known in the literature as "Two dimentional pattern matching" (hint: google it).
Here is a paper that describes both optimal and naive algorithms:
Fast two dimensional pattern matching
Another popular term is "sub-matrix matching", but this is usually used when you want certain level of fuzziness instead of exact matches. Here is an example for such algorithm:
Partial shape recognition by sub-matrix matching for partial matching guided image labeling
Related
I have calculated the essential matrix using the 5 point algorithm. I'm not sure how to integrate it with ransac so it gives me a better outcome.
Here is the source code. https://github.com/lunzhang/openar/blob/master/src/utils/5point/computeEssential.js
Currently, I was thinking about computing the essential matrix for 5 random points then convert the essential matrix to fundamental and see the error threshold using this equation x'Fx = 0. But then I'm not sure, what to do after.
How do I know which points to set as outliners? If the errors too big, do I set them as outliners right away? Could it be possible that one point could produce different essential matrices depending on what the other 4 points are?
Well, here is a short explanation, in pseudo-code, of how you can integrate this with ransac. Basically, all Ransac does is compute your model (here the Essential) using a subset of the data, and then sees if the rest of data "is happy" with that result. It keeps the result for which a highest portion of the dataset "is happy".
highest_number_of_happy_points=-1;
best_estimated_essential_matrix=Identity;
for iter=1 to max_iter_number:
n_pts=get_n_random_pts(P);//get a subset of n points from the set of points P. You can use 5, but you can also use more.
E=compute_essential(n_pts);
number_of_happy_points=0;
for pt in P:
//we want to know if pt is happy with the computed E
err=cost_function(pt,E);//for example x^TFx as you propose, or X^TEX with the essential.
if(err<some_threshold):
number_of_happy_points+=1;
if(number_of_happy_points>highest_number_of_happy_points):
highest_number_of_happy_points=number_of_happy_points;
best_estimated_essential_matrix=E;
This should do the trick. Usually, you set some_threshold experimentally to a low value. There are of course more sophisticated Ransacs, you can easily find them by googling.
Your idea of using x^TFx is fine in my opinion.
Once this Ransac completes, you will have best_estimated_essential_matrix. The outliers are those that have a x^TFx value that is greater than your optional threshold.
To answer your final question, yes, a point could produce a different matrix given 4 different points, because their spatial configuration is different (you can have degenerate situations). In an ideal settings this wouldn't be the case, but we always have noise, matching errors and so on, so what happens in the end is that the equations you obtain with 5 points wont produce the exact same results as for 5 other points.
Hope this helps.
Given a highly compressed image (non-specific format) there are various sized and shaped blocks of the image in which all pixels have exactly the same value.
For example:
My goal is to "intelligently" smooth these blocks into gradients producing a smoother more organic looking image. I've seen in-painting techniques (such as heat diffusion) that might be applicable though I'm not entirely sure how to adapt them to my purpose. I'm currently writing my own function to perform this action (details below). Is there is already a C++ function in OpenCV (or elsewhere) that can perform this process? If not, is there a different method than the one I am using that might produce better/faster results?
[Note: All of my images are converted to floating point before processing.]
My current idea involves testing whether a pixel is identical to any of it's neighbors. If so I begin a search starting from the pixel location and working outward until a non-identical neighbor is found. This unfortunately requires that I use the entire image rather than a sliding kernel. I won't include the code of the search here because it is long and repetitive. But it essentially involves testing a column to the left, a row above, a column to the right, and a row below the current pixel and expanding them as I work outward. Like this:
13 14 15 16 17
12 3 4 5 18
11 2 x 6 19
10 1 7 8 20
9 22 23 24 21
Once a target is acquired, additional consideration is taken for targets that may be in a larger search range but have a smaller euclidean distance.
If the color difference is within an acceptable range, I then calculate the euclidean distance of the nearest non-identical neighbor and calculate a pixel value based on the euclidean distance and the color difference between the two non-identical pixels. I also use a user set sigma value to affect the falloff of the gradient.
output_value = current_pixel_value - ((current_pixel_value-test_pixel_value)/(euclidean_distance*sigma));
This method works "ok" but it is slow in images that have huge macro blocks and the output still has a "banded" look to it. (See "white" sections of walls, floors, etc).
Result:
The paper Advanced Video Debanding by Gary Baugh et al. does what you do for over-compressed video. The underlying algorithm should also be applicable to still image. Maybe it helps.
I am trying to implement the pyramid match kernel , and now I am stuck in a point .
I understand i need to partition the feature space into increasing larger bins , so that at higher levels multiple points[feature vectors] will map to a single bin. What I cant seem to figure out is what is how to partition a feature space. I understood the case where the feature vectors are 1 or 2 dimensional , but how to partition a d dimensional feature space.
I understand the question is vague , But I just dont know where else to ask.
i may be wrong here, but I guess the intuition is to quantize the feature space. So, you could basically do bag of words with different code book sizes (128,64,32...) and use their kernel to compute similarity between 2 images.
I have 2D data (I have a zero mean normalized data). I know the covariance matrix, eigenvalues and eigenvectors of it. I want to decide whether to reduce the dimension to 1 or not (I use principal component analysis, PCA). How can I decide? Is there any methodology for it?
I am looking sth. like if you look at this ratio and if this ratio is high than it is logical to go on with dimensionality reduction.
PS 1: Does PoV (Proportion of variation) stands for it?
PS 2: Here is an answer: https://stats.stackexchange.com/questions/22569/pca-and-proportion-of-variance-explained does it a criteria to test it?
PoV (Proportion of variation) represents how much information of data will remain relatively to using all of them. It may be used for that purpose. If POV is high than less information will be lose.
You want to sort your eigenvalues by magnitude then pick the highest 1 or 2 values. Eigenvalues with a very small relative value can be considered for exclusion. You can then translate data values and using only the top 1 or 2 eigenvectors you'll get dimensions for plotting results. This will give a visual representation of the PCA split. Also check out scikit-learn for more on PCA. Precisions, recalls, F1-scores will tell you how well it works
from http://sebastianraschka.com/Articles/2014_pca_step_by_step.html...
Step 1: 3D Example
"For our simple example, where we are reducing a 3-dimensional feature space to a 2-dimensional feature subspace, we are combining the two eigenvectors with the highest eigenvalues to construct our d×kd×k-dimensional eigenvector matrix WW.
matrix_w = np.hstack((eig_pairs[0][1].reshape(3,1),
eig_pairs[1][1].reshape(3,1)))
print('Matrix W:\n', matrix_w)
>>>Matrix W:
[[-0.49210223 -0.64670286]
[-0.47927902 -0.35756937]
[-0.72672348 0.67373552]]"
Step 2: 3D Example
"
In the last step, we use the 2×32×3-dimensional matrix WW that we just computed to transform our samples onto the new subspace via the equation
y=W^T×x
transformed = matrix_w.T.dot(all_samples)
assert transformed.shape == (2,40), "The matrix is not 2x40 dimensional."
First off, sorry for not posting the code here. For some reason all the code got messed upp when i tried to enter the code i had onto this page, and it probably was too much anyhow to post, to be acceptable. Here is my code: http://pastebin.com/bmMRehbd
Now from what im being told, the reason why i can't get a good result out of this code is because i'm not using overlap add. I have tried to read on several sources on the internet as to why i need to use overlap add, but i can't understand it. It seems like the actuall filter works, cause anything above the given cutoff, gets indeed cutoff.
I should mention this is code made to work for vst2-sdk.
Can someone tell me why i need to add it and how i can implement a overlap add code into the given code?
I should also mention that i'm pretty stupid when it comes to algoritms and maths. I'm one of those persons who need to visually get a grip of what i'm doing. That or getting stuff explained by code :), and then i mean the actual overlap.
Overlad add theory: http://en.wikipedia.org/wiki/Overlap%E2%80%93add_method
Thanks for all the help you can give!
The overlap-add method is needed to handle the boundaries of each fft buffer. The problem is that multiplication in the FFT domain results in circular convolution in the time domain. This means that after perfoming the IFFT, the results at the end of the frame wrap around and corrupt the output samples at the beginning of the frame.
It may be easier to think about it this way: Say you have a filter of length N. Linear convolution of this filter with M input samples actually returns M+N-1 output samples. However, the circular convolution done in the FFT domain results in the same number of input and output samples, M. The extra N-1 samples from linear convolution have "wrapped" around and corrupted the first N-1 output samples.
Here's an example (matlab or octave):
a = [1,2,3,4,5,6];
b = [1,2,1];
conv(a,b) %linear convolution
1 4 8 12 16 20 17 6
ifft(fft(a,6).*fft(b,6)) %circular convolution
18 10 8 12 16 20
Notice that the last 2 samples have wrapped around and added to the first 2 samples in the circular case.
The overlap-add/overlap-save methods are basically methods of handling this wraparound. The overlap of FFT buffers is needed since circular convolution returns fewer uncorrupted output samples than the number of input samples.
When you do a convolution (with a finite impulse response filter) by taking the inverse discrete Fourier transform of the product of the discrete Fourier transforms of two input signals, you are really implementing circular convolution. I'll hereby call this "convolution computed in the frequency domain." (If you don't know what a circular convolution is, look at this link. It's basically a convolution where you assume the domain is circular, i.e., shifting the signal off the sides makes it "wrap around" to the other side of the domain.)
You generally want to perform convolution by using fast Fourier transforms for large signals because it's computationally more efficient.
Overlap add (and its cousin Overlap save) are methods that work around the fact the convolutions done in the frequency domain are really circular convolutions, but that in reality we rarely ever want to do circular convolution, but typically rather linear convolutions.
Overlap add does it by "zero-padding" chunks of the input signal and then approrpriately using the portion of the circular convolutions (that were done in the frequency domain) appropriately. Overlap save does it by only keeping the portion of the signal that corresponds to linear convolution and tossing the part that was "corrupted" by the circular shifts.
Here are two links for from Wikipedia for both methods.
Overlap-add : This one has a nice figure explaining what's going on.
Overlap-save
This book by Orfanidis explains it well. See section 9.9.2. It's not the "de facto" standard on signal processing, but it's extremely well written and is a better introduction than other books, in my opinion.
First, understand that convolution in the time domain is equivalent to multiplication in the frequency domain. In convolution, you are at roughly O(n*m) where n is the FIR length and m is the number of samples to be filtered. In the frequency domain, using the FFT, you are running a O(n * log n). For large enough n, the cost of filtering is substantially less when doing it the frequency domain. If n is relatively small, however, the benefits decrease to the point its simpler to filter in the time domain. This breakpoint is subjective, however, figure 50 to 100 as being the point where you might switch.
Yes, a convolution filter will "work", in term of changing the frequency response. But this multiplication in the frequency domain will also contaminate time-domain data at one end with data from the other end, and vice-versa. Overlap add/save extends the FFT size and chops off the "contaminated" end, and then uses that end data to fix the beginning of the subsequent FFT window.