What exactly is Pairwise Matching and How it works? - c++

I'm working on Multiple Image Stitching and I came around the term Pairwise Matching. I almost searched on every site but am unable to get CLEAR description on what it exactly is and how it works.
I'm working in Visual Studio 2012 with opencv. I have modified stitching_detailed.cpp according to my requirement and am very successful in maintaining the quality with significantly less time, except pairwise matching. I'm using ORB to find feature points. BestOf2NearestMatcher is used in stitching_detailed.cpp for pairwise matching.
What I know about Pairwise Matching and BestOf2NearestMatcher:
(Correct me if I'm wrong somewhere)
1) Pairwise Matching works similarly like other matchers such as Brute Force Matcher, Flann Based Matcher, etc.
2) Pairwise Matching works with multiple images unlike the above matchers. You have to go one by one if you want to use them for multiple images.
3) In Pairwise Matching, the features of one image are matched with every other image in the data set.
4) BestOf2NearestMatcher finds two best matches for each feature and leaves the best one only if the ratio between descriptor distances is greater than the threshold match_conf.
What I want to know:
1) I want to know more details about pairwise matching, if I'm missing some on it.
2) I want to know HOW pairwise matching works, the actual flow of it in detail.
3) I want to know HOW BestOf2NearestMatcher works, the actual flow of it in detail.
4) Where can I find code for BestOf2NearestMatcher? OR Where can I get similar code to BestOf2NearestMatcher?
5) Is there any alternative I can use for pairwise matching (or BestOf2NearestMatcher) which takes less time than the current one?
Why I want to know and what I'd do with it:
1) As I stated in the introduction part, I want to reduce the time pairwise matching takes. If I'm able to understand what actually pairwise matching is and how it works, I can create my own according to my requirement or I can modify the existing one.
Here's where I posted a question in which I want to reduce time for the entire program: here. I'm not asking the same question again, I'm asking about specifics here. There I wanted to know how can I reduce time in pairwise matching as well as other code sections and here I want to know what pairwise matching is and how it works.
Any help is much appreciated!
EDIT: I found the code of pairwise matching in matchers.cpp. I created my own function in the main code to optimize the time. Works good.

Related

What are hp.Discrete and hp.Realinterval? Can I include more values in hp.realinterval instead of just 2?

I am using Hyperparameter using HParams Dashboard in Tensorflow 2.0-beta0 as suggested here https://www.tensorflow.org/tensorboard/r2/hyperparameter_tuning_with_hparams
I am confused in step 1, I could not find any better explanation. My questions are related to following lines:
HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))
My question:
I want to try more dropout values instead of just two (0.1 and 0.2). If I write more values in it then it throws an error- 'maximum 2 arguments can be given'. I tried to look for documentation but could not find anything like from where these hp.Discrete and hp.RealInterval functions came.
Any help would be appreciated. Thank you!
Good question. They notebook tutorial lacks in many aspects. At any rate, here is how you do it at a certain resolution res
for dropout_rate in tf.linspace(
HP_DROPOUT.domain.min_value,
HP_DROPOUT.domain.max_value,
res,):
By looking at the implementation to me it really doesn't seem to be GridSearch but MonteCarlo/Random search (note: this is not 100% correct, please see my edit below)
So on every iteration a random float of that real interval is chosen
If you want GridSearch behavior just use "Discrete". That way you can even mix and match GridSearch with Random search, pretty cool!
Edit: 27th of July '22: (based on the comment of #dpoiesz)
Just to make it a little more clear, as it is sampled from the intervals, concrete values are returned. Therefore, those are added to the grid dimension and grid search is performed using those
RealInterval is a min, max tuple in which the hparam will pick a number up.
Here a link to the implementation for better understanding.
The thing is that as it is currently implemented it does not seems to have any difference in between the two except if you call the sample_uniform method.
Note that tf.linspace breaks the mentioned sample code when saving current value.
See https://github.com/tensorflow/tensorboard/issues/2348
In particular OscarVanL's comment about his quick&dirty workaround.

Levenstein distance, multiple paths

Edit: TL;DR version: how to get all possible backtraces for Damerau–Levenshtein distance between two words? I'm using https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm in order to compute distance, and trivial backtrace algorithm (illustrated below) in order to reconstruct corrections list.
More details below:
Just got stuck with optimal string alignment (sort of Damerau–Levenshtein distance) while trying to get a complete set of possible alignments.
Goal is to align 2 strings for further comparison in auto-suggestions algorithm. Particularly, I'd like to ignore insertions past the end of 1st word.
The problem that in some cases multiple "optimal" alignments is possible, e.g.
align("goto", "go to home")
1) go to
go to home
2) go t o
go to home
Unfortunately, mine implementation finds second variant only, while I need both or 1st one.
I've tried to perform some kind of A* or BFS path finding, but it looks like cost computation matrix is "tuned" for (2) variant only. There is screenshot below where I can find red path, but it looks like there is no green path:
However, someone made a web demo which implements exactly what I want:
What I'm missing here?
Perhaps my implementation is too long to post it here, so there is a link to github: https://github.com/victor-istomin/incrementalSpellCheck/blob/f_improvement/spellCheck.hpp
Distance implementation is located in optimalStringAlignementDistance() and optimalStringAlignmentBacktrace() methods.

Similarity of a group of text documents

I am looking for an algorithm that tries to check
1) the similarity of sentences (around 5000) with each other in a document
2) the similarity of multiple documents (around 5000) with respect to each other
I need the same because I'm trying to evaluate whether the text documents/ sentences coming under a particular category are in any manner similar to each other . Are there any existing methods for doing the same.
The standard approach is to use cosine similarity, with TF-IDF normalization.
There are many variants of this, you will need to experiment what works best for you.

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.

How to detect and delete noise in rapidminer?

I am new in rapid miner 5, just want to know how to find noise in my data and show them in chart and how to delete them?
A complex problem because it depends what you mean by noise.
If you mean finding individual attributes whose values are plain wrong then you could plot a histogram view and work out some sort of limits on what constitutes a valid value. You could then impose that rule by using Filter Examples to remove them.
If you mean finding attributes that have some sort of random jitter applied to them it would be difficult to detect these. Only by knowing beforehand what the expected shape of the distribution is could you compare with observation and do something about it. However, the action to take is by no means obvious.
If you mean finding examples within an example set that are obviously different from other examples then you could consider using the various outlier functions. The simplest one to get started is Detect Outlier (Distances). This finds a set number of outliers (default 10) based on a distance calculation that uses all the attributes for examples. It creates a new attribute called outlier that is set to true or false. You could then use the Filter Examples operator to remove those that are set to true.
Hope that helps at least as a start.