This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Write a function that returns the longest palindrome in a given string
I have a C++ assignment which wants me write a program that finds the longest palindrome in a given text. For example, the text is this: asdqerderdiedasqwertunut, my program should find tunut in the index of 19. However if input is changed into this astunutsaderdiedasqwertunutit should find astunutsa in the index of 0 instead of tunutin index of 22.
So, my problem is this. But I am a beginner at the subject, i know just string class, loops, ifs. It would be great if you could help me on this.
Thanks in advance.
The idea is very simple:
Write a function is_palindrome(string) that takes a string, and returns true if it is a palindrome and false if it is not
With that function in hand, write two nested loops cutting out different substrings from the original string. Pass each substring to is_palindrome(string), and pick the longest one among the strings returning true.
You can further optimize your program by examining longest substrings ahead of shorter ones. If you examine substrings from longest to shortest, you'll be able to return as soon as you find the first palindrome.
Dasblinkenlight's idea is pretty good, but it's faster this way:
A palindrome has either an even number of letters or odd, so you have two situations. Let's start with the even. You need to find two consecutive identical letters, and then check whether the immediately previous letter is identical to the next letter. The same in the other situation, except at first you only need one letter. I don't speak English that well, so I hope you understood. :)
Related
i'm trying to get better performance from a pattern check in a really wide list of strings.
i need the 5 first occurences that would match a given pattern.
i was wondering if
list.where(pattern in string).take(5)
was lazily computed and stops after 5 occurences found or
does it compute all the where and then takes the 5 first ? ( in that case, is there a whereXfirstOccurences method where X is a number ? )
thank you,
Edit:
i did some investigation
myList.where((element) {bool isSuggestion = the conditions ;
if (isSuggestion) index++;
return isSuggestion;
})
.take(x)
.toList();
print(index);
the index is always at most equal to x so i guess it's lazy evaluation as mentionned below, Thank you :)
Iterables are lazy.
If you do list.where(computation).take(5), it:
Doesn't do anything at all, until you start iterating.
It doesn't do anything except when you call moveNext on the iterator.
And it stops doing anything once moveNext has returned false, which it does after five elements here, because of the take(5).
If you just use for (var v in list.where(...).take(5)) ... you won't see those steps, but they are still there. The loop stops after finding five values, and no further elements are looked at than the ones needed to find the first five satisfying the where condition.
That might still be a lot of strings looked at, if the condition is very picky. If there are only four matching strings in the input, you will go through all of the input when looking for the first five matches.
Optimizing the pattern itself can definitely be valuable as well.
I've attempted at googling algorithms for a program that outputs the result indicated in the question. Mostly, all what I've found was algorithms that satisfied the first constraint, but did not take into account the second part (ignoring the casing of letters). Conventional functions, such as strcmpi (I'm using c++) requires constant characters which make it impossible to incorporate within the algorithms alluded to above. In essence, I just need an idea on how I can go about creating such a program.
First create a program that identifies longest substring palindrome using your own compare function. And in that compare function if two characters are same then return true else if the difference between ASCII values of two characters is 32 then also return true. And rest as it is.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
string comparison with the most similar string
I was wondering what the best way to go about comparing two strings for (For a certain percentage of) similarity is. EX: String 1 is "I really like to eat pie," and String 2 is "I really like to eat cheese," with a function returning "true" because more than 50% of the characters are similar.
I was thinking that I could see if each character in one string is somewhere in the other, but there's probably a more precise way to go about things. Any suggestions?
Levenshtein distance might be suitable. It tells how many single-character insertions, deletions or replacements must be made in order to transform one string into the other. You can also give different priorities to the three operations.
For a fuzzy compare like this you could split each string up into words (using strtok()) and compare the two word arrays case-insensitive using stricmp(). There is also the SOUNDEX algorithm to compare words to see if they sound the same.
I am working on a spell checker in C++ and I'm stuck at a certain step in the implementation.
Let's say we have a text file with correctly spelled words and an inputted string we would like to check for spelling mistakes. If that string is a misspelled word, I can easily find its correct form by checking all words in the text file and choosing the one that differs from it with a minimum of letters. For that type of input, I've implemented a function that calculates the Levenshtein edit distance between 2 strings. So far so good.
Now, the tough part: what if the inputted string is a combination of misspelled words? For example, "iloevcokies". Taking into account that "i", "love" and "cookies" are words that can be found in the text file, how can I use the already-implemented Levenshtein function to determine which words from the file are suitable for a correction? Also, how would I insert blanks in the correct positions?
Any idea is welcome :)
Spelling correction for phrases can be done in a few ways. One way requires having an index of word bi-grams and tri-grams. These of course could be immense. Another option would be to try permutations of the word with spaces inserted, then doing a lookup on each word in the resulting phrase. Take a look at a simple implementation of a spell checker by Peter Norvig from Google. Either way, consider using an n-gram index for better performance, there are libraries available in C++ for reference.
Google and other search engines are able to do spelling correction on phrases because they have a large index of queries and associated result sets, which allows them to calculate a statistically good guess. Overall, the spelling correction problem can become very complex with methods like context-sensitive correction and phonetic correction. Given that using permutations of possible sub-terms can become expensive you can utilize certain types of heuristics, however this can get out of scope quick.
You may also consider using and existing spelling library, such as aspell.
A starting point for an idea: one of the top hits of your L-distance for "iloevcokies" should be "cookies". If you can change your L-distance function to also track and return a min-index and max-index (i.e., this match is best starting from character 5 and going to character 10) then you could remove that substring and re-check L-distance for the string before it and after it, then concatenate those for a suggestion....
Just a thought, good luck....
I will suppose that you have an existing index, on which you run your levenshtein distance (for example, a Trie, but any sorted index generally work well).
You can consider the addition of white-spaces as a regular edit operation, it's just that there is a twist: you need (then) to get back to the root of your index for the next word.
This way you get the same index, almost the same route, approximately the same traversal, and it should not even impact your running time that much.
This question already has answers here:
Using Regex to generate Strings rather than match them
(12 answers)
Closed 1 year ago.
How would you go about creating a random alpha-numeric string that matches a certain regular expression?
This is specifically for creating initial passwords that fulfill regular password requirements.
Welp, just musing, but the general question of generating random inputs that match a regex sounds doable to me for a sufficiently relaxed definition of random and a sufficiently tight definition of regex. I'm thinking of the classical formal definition, which allows only ()|* and alphabet characters.
Regular expressions can be mapped to formal machines called finite automata. Such a machine is a directed graph with a particular node called the final state, a node called the initial state, and a letter from the alphabet on each edge. A word is accepted by the regex if it's possible to start at the initial state and traverse one edge labeled with each character through the graph and end at the final state.
One could build the graph, then start at the final state and traverse random edges backwards, keeping track of the path. In a standard construction, every node in the graph is reachable from the initial state, so you do not need to worry about making irrecoverable mistakes and needing to backtrack. If you reach the initial state, stop, and read off the path going forward. That's your match for the regex.
There's no particular guarantee about when or if you'll reach the initial state, though. One would have to figure out in what sense the generated strings are 'random', and in what sense you are hoping for a random element from the language in the first place.
Maybe that's a starting point for thinking about the problem, though!
Now that I've written that out, it seems to me that it might be simpler to repeatedly resolve choices to simplify the regex pattern until you're left with a simple string. Find the first non-alphabet character in the pattern. If it's a *, replicate the preceding item some number of times and remove the *. If it's a |, choose which of the OR'd items to preserve and remove the rest. For a left paren, do the same, but looking at the character following the matching right paren. This is probably easier if you parse the regex into a tree representation first that makes the paren grouping structure easier to work with.
To the person who worried that deciding if a regex actually matches anything is equivalent to the halting problem: Nope, regular languages are quite well behaved. You can tell if any two regexes describe the same set of accepted strings. You basically make the machine above, then follow an algorithm to produce a canonical minimal equivalent machine. Do that for two regexes, then check if the resulting minimal machines are equivalent, which is straightforward.
String::Random in Perl will generate a random string from a subset of regular expressions:
#!/usr/bin/perl
use strict;
use warnings;
use String::Random qw/random_regex/;
print random_regex('[A-Za-z]{3}[0-9][A-Z]{2}[!##$%^&*]'), "\n";
If you have a specific problem, you probably have a specific regular expression in mind. I would take that regular expression, work out what it means in simple human terms, and work from there.
I suspect it's possible to create a general regex random match generator, but it's likely to be much more work than just handling a specific case - even if that case changes a few times a year.
(Actually, it may not be possible to generate random matches in the most general sense - I have a vague memory that the problem of "does any string match this regex" is the halting problem in disguise. With a very cut-down regex language you may have more luck though.)
I have written Parsley, which consist of a Lexer and a Generator.
Lexer is for converting a regular expression-like string into a sequence of tokens.
Generator is using these tokens to produce a defined number of codes.
$generator = new \Gajus\Parsley\Generator();
/**
* Generate a set of random codes based on Parsley pattern.
* Codes are guaranteed to be unique within the set.
*
* #param string $pattern Parsley pattern.
* #param int $amount Number of codes to generate.
* #param int $safeguard Number of additional codes generated in case there are duplicates that need to be replaced.
* #return array
*/
$codes = $generator->generateFromPattern('FOO[A-Z]{10}[0-9]{2}', 100);
The above example will generate an array containing 100 codes, each prefixed with "FOO", followed by 10 characters from "ABCDEFGHKMNOPRSTUVWXYZ23456789" haystack and 2 numbers from "0123456789" haystack.
This PHP library looks promising: ReverseRegex
Like all of these, it only handles a subset of regular expressions but it can do fairly complex stuff like UK Postcodes:
([A-PR-UWYZ]([0-9]([0-9]|[A-HJKSTUW])?|[A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?) ?[0-9][ABD-HJLNP-UW-Z]{2}|GIR0AA)
Outputs
D43WF
B6 6SB
MP445FR
P9 7EX
N9 2DH
GQ28 4UL
NH1 2SL
KY2 9LS
TE4Y 0AP
You'd need to write a string generator that can parse regular expressions and generate random members of character ranges for random lengths, etc.
Much easier would be to write a random password generator with certain rules (starts with a lower case letter, has at least one punctuation, capital letter and number, at least 6 characters, etc) and then write your regex so that any passwords created with said rules are valid.
Presuming you have both a minimum length and 3-of-4* (or similar) requirement, I'd just be inclined to use a decent password generator.
I've built a couple in the past (both web-based and command-line), and have never had to skip more than one generated string to pass the 3-of-4 rule.
3-of-4: must have at least three of the following characteristics: lowercase, uppercase, number, symbol
It is possible (for example, Haskell regexp module has a test suite which automatically generates strings that ought to match certain regexes).
However, for a simple task at hand you might be better off taking a simple password generator and filtering its output with your regexp.