Shortest Path in a Trie

Shortest Path in a Trie - c++

For a Data Structures project, I must find the shortest path between two words like "cat" and "dog but i'm only allowed to change one letter at a time. I'm trying to do it by implementing a trie, and can't seem to be able to implement a shortest path search.
cat -> cot -> cog -> dog
All the words will be of the same length and I am populating them from a dictionary file.
We must move from word to word. So the word in between must be a valid word.
I think it's not really possible using a trie, but anyone have any knowledge?

You want to use a VP-Tree and the algorithm is called Levenshtein distance
A C implementation can be found here, the code is far too long to post as an answer:
C VP-Tree

A better data structure for this kind of problem is graph.
It's called word ladder and you can look it up here: http://en.wikipedia.org/wiki/Word_ladder.

What you are seeking for is a simple BFS. Each word is a graph vertex, but there is even no need to build the graph, you can solve it using array of words:
words = {"cat", "dog", "dot", "cot"}
mark = {0, 0, 0, 0}
distance = {0, 0, 0, 0}
queue Q
start_word_index = 0; // words[0] -> "cat"
destination_word_index = 1; // words[1] -> "dog"
Q.push(start_word_index)
while(Q is not empty) {
word_index = Q.pop()
for each `words[j]` {
if (difference between `words[word_index]` and `words[j]` is only 1 character) AND
(`mark[j]` is not 1) {
mark[j] = 1
Q.push(j)
distance[j] = distance[word_index] + 1
}
}
}
if mark[destination_word_index] is 0 {
print "Not reachable"
} else {
print "Distance is ", distance[destination_word_index]
}

Related

How to find if a substring from a string vector is contained in another group/vector/list c++

let me give an example:I have three groups of strings which will have a fixed size, I thought of using lists. Let's say I name them Red, Green, Blue:
std::string Red[] = {"apple", "rose mary", "watermelon"};
std::string Green[] = {"cucumber", "avocado", "pine tree"};
std::string Blue[] = {"sea", "lake"};
I have found examples here where we search one item inside each of these lists, and they find it if the string matches, for example in that case I should have :
std::string myinput = "watermelon";
if (std::find(std::begin(Red), std::end(Red), myinput) != std::end(Red))
{
cout << "found " << myinput << " in Red" << endl;
}
Ok so far, but I want something different:
I want to scan a vector with 1000 elememnts, and myinput belongs to one of those elements which I access like this:
for (int j = 0; j < Vector.size(); j++){
if (Vector[j].message == contains a string from one of the groups ){
cout<< Vector[j].message << endl;}
}
The Vector[j].message will be a string which has this format:
"flag: She adores watermelon"
"flag: They visited the lake"
"flag: He has cucumber for salad"
"flag: Wacamole made of avocado"
You see that the substring flag is common in all strings of the vectors. However, the watermelon doesn't exist in another group of strings.
The goal is to scan each group of lists and find that the element of the vector
"flag: She adores watermelon" is listed in group Red. This should not be listed in the group yellow just because of the substring "flag".
Also, I want the substring to contain the whole string stated in the group, for example if the Vector contains an element like "flag: the plant has many pines" , this should not be listed to the Green group, it should be uncategorized.
Then these messages should be categorized and printed in different colors, red first in red colour etc.
First, do you agree with lists' idea? Do you suggest a more efficient way? What do you suggest for the substring search?
Excuse the lame examples and if my description in the title is not clear. I am new to this and looking for ideas.

You can use regular expressions. Instead of this:
{"apple", "rose mary", "watermelon"}
Use this:
std::regex red("apple|rose mary|watermelon");
Then for each input line:
if (std::regex_search(line, red)) {
// it's red
}
You can then create a vector<pair<regex, string>> and name each pattern:
vector<pair<regex, string>> patterns = {
{"apple|rose mary|watermelon", "red"},
{"cucumber|avocado|pine tree", "green"},
{"sea|lake", "blue"},
};
This way you can easily iterate over all the patterns and get the color for any match.

Getting the index of a slice

I want to do some processing on a string in Scala. The first stage of that is finding the index of articles such as: "A ", " A ", "a ", " a ". I am trying to do that like this:
"A house is in front of us".indexOfSlice("\\s+[Aa] ")
I think this should return 0, as the substring is first matched in the first position of the string.
However, this returns -1.
Why does it return -1? Is the regex I am using incorrect?

The other answers as I type this are just missing the point. Your problem is that indexOfSlice doesn't take a regexp, but a sub-sequence to seach for in the sequence. So fixing the regexp won't help at all.
Try this:
val pattern = "\\b[Aa]\\b".r.unanchored
for (mo <- pattern.findAllMatchIn("A house is in front of us, a house is in front of us all")) {
println("pattern starts at " + mo.start)
}
//> pattern starts at 0
//| pattern starts at 27
(with fixed regex, too)
Edit: counter-example for the popular but wrong suggestion of "\\s*[Aa] "
val pattern2 = "\\s*[Aa] ".r.unanchored
for (mo <- pattern2.findAllMatchIn("The agenda is hidden")) {
println("pattern starts at " + mo.start)
}
//> pattern starts at 9

I see a mistake in your regex. your regex is searching for
at least once space (\s+)
a letter (either A or a)
but string you are matching doesn't contain space in beginning. that's why It's not returning you index 0 but -1.
you could write your regex as "^\\s*[Aa] "
Here is example:
val text = "A house is in front of us";
val matcher = Pattern.compile("^\\s*[Aa] ").matcher(text)
var idx = 0;
if(matcher.find()){
idx = matcher.start()
}
println(idx)
it should return 0 as expected.

Fuzzy, but not too fuzzy string matching with agrep

I have a string like this:
text <- c("Car", "Ca-R", "My Car", "I drive cars", "Chars", "CanCan")
I would like to match a pattern so it is only matched once and with max. one substitution/insertion. the result should look like this:
> "Car"
I tried the following to match my pattern only once with max. substitution/insertion etc and get the following:
> agrep("ca?", text, ignore.case = T, max = list(substitutions = 1, insertions = 1, deletions = 1, all = 1), value = T)
[1] "Car" "Ca-R" "My Car" "I drive cars" "CanCan"
Is there a way to exclude the strings which are n-characters longer than my pattern?

An alternative which replaces agrep with adist:
text[which(adist("ca?", text, ignore.case=TRUE) <= 1)]
adist gives the number of insertions/deletions/substitutions required to convert one string to another, so keeping only elements with an adist of equal to or less than one should give you what you want, I think.
This answer is probably less appropriate if you really want to exclude things "n-characters longer" than the pattern (with n being variable), rather than just match whole words (where n is always 1 in your example).

You can use nchar to limit the strings based on their length:
pattern <- "ca?"
matches <- agrep(pattern, text, ignore.case = T, max = list(substitutions = 1, insertions = 1, deletions = 1, all = 1), value = T)
n <- 4
matches[nchar(matches) < n+nchar(pattern)]
# [1] "Car" "Ca-R" "My Car" "CanCan"

Scala Map a list of items to a value

I have a list of bigrams of a sentence and another original list of relevantbigrams, I want to check that if any of the relevantbigrams are present in the sentences then I want to return the sentence. I was thinking of implementing it as follows: map each of the bigrams in the list to the sentence they come from then do a search on the key an return the value.
example:
relevantbigrams = (This is, is not, not what)
bigrams List(list(This of, of no, no the),list(not what, what is))
So each list is a bigram of separate sentences. Here "not what" from the second sentence matches, so I would like to return the second sentence. I am planning to have a map of Map("This of" -> "This of no the", "of no" ->"This of no the", "not what"->"not what is"). etc. and return the sentences that match on relevant bigram, so here I return "not what is"
This is my code:
val bigram = usableTweets.map(x =>Tokenize(x).sliding(2).flatMap{case Vector(x,y) => List(x+" "+y)}.map(z => z, x))
for(i<- 0 to relevantbigram.length)
if(bigram.contains(relevantbigram(i)))) bigram.get(relevantbigram(i))
else useableTweets.head

You got the order or flatMap and map the wrong way around:
val bigramMap = usableTweets.flatMap { x =>
x.split(" ").sliding(2).
map(bg => bg.mkString(" ") -> x)
} toMap
Then you can do your search like this:
relevantbigrams collect { rb if theMap contains rb => bigramMap(rb) }
Or
val found =
for {
rb <- relevantbigrams
sentence <- theMap get rb
} yield sentence
Both should give you a list, but from your code it appears you want to default to the first sentence if your search found nothing:
found.headOption.getOrElse(usableTweets.head)

Damerau-Levenshtein php

I'm searching for an implementations of the Damerau–Levenshtein algorithm for PHP, but it seems that I can't find anything with my friend google. So far I have to use PHP implemented Levenshtein (without Damerau transposition, which is very important), or get a original source code (in C, C++, C#, Perl) and write (translate) it to PHP.
Does anybody have any knowledge of a PHP implementation ?
I'm using soundex and double metaphone for a "Did you mean:" extension on my corporate intranet, and I want to implement the Damerau–Levenshtein algorithm to help me sort the results better. Something similar to this idea: http://www.briandrought.com/blog/?p=66, my implementation is similar to the first 5 steps.

I had a stab at it a recursive solution while back.
/*
* Naïve implementation of Damerau-Levenshtein distance
* (Does not work when there are neighbouring transpositions)!
*/
function DamerauLevenshtein($S1, $S2)
{
$L1 = strlen($S1);
$L2 = strlen($S2);
if ($L1==0 || $L2==0) {
// Trivial case: one string is 0-length
return max($L1, $L2);
}
else {
// The cost of substituting the last character
$substitutionCost = ($S1[$L1-1] != $S2[$L2-1])? 1 : 0;
// {H1,H2} are {L1,L2} with the last character chopped off
$H1 = substr($S1, 0, $L1-1);
$H2 = substr($S2, 0, $L2-1);
if ($L1>1 && $L2>1 && $S1[$L1-1]==$S2[$L2-2] && $S1[$L1-2]==$S2[$L2-1]) {
return min (
DamerauLevenshtein($H1, $S2) + 1,
DamerauLevenshtein($S1, $H2) + 1,
DamerauLevenshtein($H1, $H2) + $substitutionCost,
DamerauLevenshtein(substr($S1, 0, $L1-2), substr($S2, 0, $L2-2)) + 1
);
}
return min (
DamerauLevenshtein($H1, $S2) + 1,
DamerauLevenshtein($S1, $H2) + 1,
DamerauLevenshtein($H1, $H2) + $substitutionCost
);
}
}

Have a look at our implementation (with tests and documentation).

How about just using the built in php function... ?
http://php.net/manual/en/function.levenshtein.php
int levenshtein ( string $str1 , string $str2 )
int levenshtein ( string $str1 , string $str2 , int $cost_ins , int $cost_rep , int $cost_del )

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Shortest Path in a Trie - c++

You want to use a VP-Tree and the algorithm is called Levenshtein distance A C implementation can be found here, the code is far too long to post as an answer: C VP-Tree

A better data structure for this kind of problem is graph. It's called word ladder and you can look it up here: http://en.wikipedia.org/wiki/Word_ladder.

Related

How to find if a substring from a string vector is contained in another group/vector/list c++

Getting the index of a slice

Fuzzy, but not too fuzzy string matching with agrep

Scala Map a list of items to a value

Damerau-Levenshtein php

Categories

Resources