Here is the word ladder problem:
Given two words (beginWord and endWord), and a dictionary's word list, find the length of the shortest transformation sequence from beginWord to endWord, such that:
Only one letter can be changed at a time.
Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
Now along with the modification, we are allowed to delete or add an element.
We have to find minimum steps if possible to convert string1 to string2.
This problem has a nice BFS structure. Let's illustrate this using the example in the problem statement.
beginWord = "hit",
endWord = "cog",
wordList = "hot","dot","dog","lot","log","cog"
Since only one letter can be changed at a time, if we start from "hit", we can only change to those words which have exactly one letter different from it (in this case, "hot"). Putting in graph-theoretic terms, "hot" is a neighbor of "hit". The idea is simply to start from the beginWord, then visit its neighbors, then the non-visited neighbors of its neighbors until we arrive at the endWord. This is a typical BFS structure.
But now since we are allowed to add/delete also how should I proceed further?
Related
I have a txt file with 100000000 words in every new line.
I want to write a function that takes an input of the word and searches if the word is there or not in the txt file.
I have tried this with map and trie method but I'm getting std:bac_alloc error, this is due to that large number of words can anyone suggest how to solve the issue
Data structures are quite important when programming. If possible I would recommend that you use something like a binary tree. This would require sorting the text file though. If you cannot sort the text file, the best way would be to just iterate over the text file until you get the word that you wanted. Also, your comment should contain more information as to allow us to more easily diagnose your problem
I assume you want to search this word list over and over. Because for a small number of searches just search linear through the file.
Parsing the word list into a suffix tree takes about 20 times the size of the file, more if not optimized. Since you ran out of memory constructing a trie of the word list I assume it's really big. So lets not keep it in memory but process it a bit so you can search faster.
The solution I would propose is to do a dictionary search.
So first turn every whitespace into a newline so you have one word per line instead of multiple lines with multiple words and then sort the file and store it. While you are at it you can remove duplicates. That is our dictionary. Remember the length of the longest word (L) while you do that.
To access the dictionary you need a helper function to read a word at offset X, which can be at the middle of some word. The function should seek to the offset - L and read 2 * L bytes into a buffer. Then from the middle of the buffer search backward and forward to find the word at offset X.
Now to search you open the dictionary and read the word at offset left=0 and offset right = size_of_file, i.e. the first and last word. If your search term is less then the first word or larger then the last word you are done, word not found. If you found the search term you are done too.
Next in a binary search you would take the std::midpoint of left and right, read the word at that offset and check if the search term is less or more and recurse into that interval. This would require O(log n) reads to find the word or determine it's not present.
A dictionary search can do better. Instead of using the midpoint you can approximate where the word should be in the dictionary. Say your dictionary goes from "Aal" to "Zoo" and you are searching for "Zebra". Would you open the dictionary in the middle? No, you would open it near the end because Zerba is much closer to Zoo than Aal. So you need a function that gives you a value (M) between 0 and 1 of where a search term is located relative to the left and right word. Your "midpoint" for the search is then (right - left) * M. Then, like with binary search, determine if the search term is in the left or right interval and recurse.
A dictionary search takes only log log n reads on average if the word list has reasonably uniform distribution.
You are given a string example "Iamastudent" without any spaces. You will be provided with a predefined dictionary function which verifies whether a given word is present in the dictionary or not. Using this function you have to insert the spaces in the string a print it as "I am a student".
its my interview question and told me too solve in c++, i solved it using dynamic programming but he was not satisfied
the solution i gave is
same as in the below question
Given a phrase without spaces add spaces to make proper sentence
he asked me to do it using trie or suffix array but i couldnt able to figure the solution can any one help me
Find words and put spaces after them
The answer is to use Trie data structure. Create Trie with possible words and keep traversing. with Trie you can generate many different possible words.
now here "iamastudent" with Trie you could generate these words.
i, a, am, a, as, student
now you have to make a proper sentence out of these words. Here the possible solution is markov chain. A markov chain is data structure where it holds probability for next word after a word. so markov chain will be.
"i" : [ "am", "did", "went" ...],
"a" : [ "tree", "dog" ..]
"am" : [ "a" ...]
Now you these many data in sequence
[i], [a, am], [a, as], [student]
Note: I grouped all elements which starts with same character in one
list.
start with "i"
next word is "a". but in markov chain "a" is not there. so go for next word. like this you can continue.
from here onwards it is a dfs search for a valid sentence. well, it was a nice and tricky question.
If there is a unique solution of splitting the sentence then doing it with a trie is simple:
if there are characters in the input string start walking down from the root consuming characters from the string. otherwise terminate.
if it is a compressed trie you will find a mark whenever a prefix is a complete word otherwise if you reach a leaf that's when you output a space
go back to 1 (walking down from the root) starting from the current position in the string
You are done when there are no more characters in the string (you may want to check that at this point you are not traversing the tree).
If the solution is not unique, then whenever you reach the end of the string and you are not at a mark or a leaf in the tree you need to backtrack to the previous space you emitted. You need a stack for positions in the input string.
I'm solving a matching problem with two vectors of a class
class matching
{
public:
int n;
char match;
};
This is the algorithm I'm trying to implement:
int augment(vector<matching> &left, vector<matching> &right)
{
while(there's no augmenting path)
if(condition for matching)
<augment>
return "number of matching";
}
For the rough matching, if left[i] matches with right[j], then left[i].n = j, left[i].match ='M' , right[j].n = i and right[j].match = 'M' and the unmatched ones have members n = -1 and match = 'U'
While finding the augmenting paths, if one exists for another (i, j), then we change the member match of the one being unmatched from 'M' to 'U' and its n = -1 and the two matched with the augmenting path have their members match changed to 'A' while we change their members n according to their indices.
I don't know if this is the right approach to solving this, this is my first attempt on maximum matching and I've read a lot of articles and watched tutorials online and I can't get my 'code' to function appropriately.
I do not need a code, I can write my code. I just want to understand this algorithm step by step. If someone can give me an algorithm like the one I was trying above, I would appreciate it. Also, if I have been going the wrong direction since, please correct me.
I am not sure if you are finding the augmenting paths correctly. I suggest the following approach.
Find an initial matching in a greedy way. To obtain this we travel through every vertex in the left side and greedily try to match it with some free (unmatched) vertex on the right side.
Try to find an augmenting path P in the graph. For this we need to do a breadth-first search starting from all the free vertices on the left side and alternating through matched and unmatched edges in the search. (i.e. the second level contains all the right side vertices adjacent to level-1
vertices, the third level contains all the left side vertices that are
matched to level-2 vertices, the fourth level contains all the right side
vertices adjacent to level-3 vertices etc). We stop the search when we
visit a free vertex in any future level and compute the augmenting path P
using the breadth-first search tree computed so far.
If we can find an augmenting path P in the previous step: Change the matched and unmatched edges in P to unmatched and matched edges respectively and goto step 2.
Else: The resulting matching obtained is maximum.
This algorithm requires a breadth-first search for every augumentation and so it's worst-case complexity is O(nm). Although Hopcroft-Karp algorithm can perform multiple augmentations for each breadth-first search and has a better worst-case complexity, it
seems (from the Wikipedia article) that it isn't faster in practice.
I am given a set of N words, and an integer K. 2 words are in the same group if they have exactly the first k letters and the last k letters identical. If they have more than k letters identical or less than k letters identical then the words are not in the same group. For example:
For k=3.
"abcdefg" and "abczefg" are in the same group
"abcddefg" and "abcdzefg" are not in the same group (the first k+1 letters are identical)
"abc" and "abc" are in the same group
A word can be in more than 1 groups. For example (k=3):
"abczefg" and "abcefg" form a group
"abczaefg" and "abcefg" form a group
"abczaefg" and "abczefg" are not in the same group (the first k+1 letters are identical)
The problem asks me to find the number of groups which contain the maximum number of words.
I thought about using a Trie (or Prefix Tree) and I assume this is the right data structure for this problem but I don't know how can I adapt them for this problem, because the part where if 2 words have more than k letters identical are not in the same group confuse me. My ideea has the complexity O(N*N*K) and considering that N<=10,000 and K<=100 I don't think that this ideea is fast enough. I would like to explain you my ideea, but it is not cleary yet even for me and I don't even know if it is correct, so I will skip this part.
My question is if there is a way I could solve this problem using a faster algorithm, and if there is such algorithm, I kindly ask you to explain it a little bit. Thank you in advance and I am sorry for the gramatical mistakes and if I didn't explain the problem clearly!
First group all the words that share the first k letters and last k letters. Your largest group must sit inside one of these groups, since there's no way two words that differ at their starts and ends can be in the same solution.
So, within each of these groups (of words that share the same k letters at their start and end), you need to find a maximal set of words such that no two share the k+1'th letter, nor the k+1'th letter from the end.
Construct a graph where vertices are the pairs of letters that are (k+1) from each end (de-duping) from words in one of these groups, and edges occur between (a, b) and (c, d) if a=c or b=d.
You need to find a subgraph of this which has no edges in it. This reduced problem is an instance of the "maximum independent subgraph" problem, which is NP-hard, so you'll need to solve it by using a search and hoping the set of words you're given isn't too nasty. Perhaps there's something about the graphs here to give a faster solution, but I don't see it.
The solution to the entire problem is the largest solution to one of the reduced problems described above.
Hope this helps!
I am having some kind of homework and I am stuck to one point. I am given some facts like those:
word([h,e,l,lo]).
word([m,a,n]).
word([w,o,m,a,n]). etc
and I have to make a rule so that the user will input one list of letters and I should compare the list with the words I have and correct any possible mistakes. Here is the code I am using if the first letter is in the correct place:
mistake_letter([],[]).
mistake_letter([X|L1],[X|L2]):-
word([X|_]),
mistake_letter(L1,L2).
The problem is I don't know how to move to the next letter in the word fact. The next time the backtrack will run it will use the head of the word while I would like to use the second letter in the list. Any ideas on how to solve this?
I am sorry for any grammatical mistakes and I appreciate your help.
In order to move to the next letter in the word fact, you need to make the word from the fact a third argument, and take it along for the ride. In your mistake_letter/2, you will pick words one by one, and call mistake_letter/3, passing the word you picked along, like this:
mistake_letter(L1,L2):-
word(W),
mistake_letter(L1,L2,W).
The you'll need to change your base case to do something when the letters in the word being corrected run out before the letters of the word that you picked. What you do depends on your assignment: you could backtrack mistake_letter([],[],[])., declare a match mistake_letter([],[],_)., attach word's tail to the correction mistake_letter([],W,W). or do something else.
You also need an easy case to cover the situation when the first letter of the word being corrected matches the first letter of the word that you picked:
mistake_letter([X|L1],[X|L2],[X|WT]):-
mistake_letter(L1, L2, WT).
Finally, you need the most important case: what to do when the initial letters do not match. This is probably the bulk of your assignment: the rest is just boilerplate recursion. In order to get it right, you may need to change mistake_letter/3 to mistake_letter/4 to be able to calculate the number of matches, and later compare it to the number of letters in the original word. This would let you drop "corrections" like [w,o,r,l,d] --> [h,e,l,l,o] as having only 20% of matching letters.