Keep case with regex find and replace - regex

My question is pretty straightforward, using only a regular expression find and replace, is it possible to keep the case of the original words.
So if I have the string: "Pretty pretty is so pretty"
How can I turn it into: "Lovely lovely is so lovely"
All I have so far is find /(P|p)retty/g and replace with $1ovely but I dont know how to replace caplital P with L and lowercase p with l.
I am not interested in accomplishing this in any particular language, I want to know if it is possible to do with pure regex.

It can't be possible to replace captured uppercase or lowercase letter with the letter according to the type of letter captured through regex alone. But it can be possible through language built-in functions + regex.
In php, i would do like.
$str = "Pretty pretty is so pretty";
echo preg_replace_callback('~([pP])retty~', function ($m)
{
if($m[1] == "P") {
return "Lovely"; }
else { return "lovely"; }
}, $str);
Output:
Lovely lovely is so lovely

Related

Perl string manipulation and find

I am currently working on a phonebook program for a class and I am having a little bit of trouble with the regex part in order to format my text and find what im looking for. Firstly, I am having trouble editing my phone number text to what I want. I am able to find the text that have 7 numbers in a row (777777) but I am unable to substitute it to (1-701-777-777).
if($splitIndex[1] =~ m/^(\d{3}\d{4})/) {
$splitIndex[1] =~ s/([\d{3}][\d{4}])/1-701-[$1]-[$2]/;
print "Updated: $splitIndex[1]";
}
When I run this code the output ends up being (wont let me imbed image here is output https://imgur.com/a/8HtW7xm).
Secondly, I am having trouble doing the actual regex part for the searching. I save all the possible letter combinations in $letofSearch and the number order combination in $numOfSearch. Through playing around in regex I have figured out if I do [$numOfSearch]+[$numOfSearch[-1]...[$numOfSearch[1] it gives me the correct find for the numbers but I am unable to write it properly in my code.
#If user input is only numbers
if($searchValue =~ m/(\D)/) {
#print "Not a number\n";
if($splitIndex[1] =~ m/([$numOfSearch]+)/) {
if($found == 0) {
print "$splitIndex[0]:$splitIndex[1]\n";
$found = 1;
}
}
if($splitIndex[0] =~ m/([$letOfSearch])/i) {
if($found == 0) {
print "$splitIndex[0]:$splitIndex[1]\n";
$found = 1;
}
}
$found = 0;
} else {
#If it is a number search for that number combo immedietly
if($splitIndex[1] =~ m/([$numOfSearch]+)/) {
if($found == 0) {
print "$splitIndex[0]:$splitIndex[1]\n";
$found = 1;
}
}
if($splitIndex[0] =~ m/([$letOfSearch])/i) {
if($found == 0) {
print "$splitIndex[0]:$splitIndex[1]\n";
$found = 1;
}
}
$found = 0;
}
}
}
Instead of:
if($splitIndex[1] =~ m/^(\d{3}\d{4})/) {
$splitIndex[1] =~ s/([\d{3}][\d{4}])/1-701-[$1]-[$2]/;
print "Updated: $splitIndex[1]";
}
try this:
if ($splitIndex[1] =~ s/(\d{3})(\d{4})/1-701-$1-$2/)
{
print "Updated: $splitIndex[1]";
}
In regular expressions, a set of square brackets ([ and ]) will match one and only one character, regardless of what's between the brackets. So when you write [\d{3}][\d{4}], that will match exactly two characters, because you are using two sets of []. And those two characters will be one of \d (any digit), {, 3, 4, or }, because that's what you wrote inside the brackets.
The order doesn't matter inside of the square brackets of a regular expression, so [\d{3}] is the same as [}1527349806{3]. As you can see, that's probably not what you wanted.
What you meant to do was capture the \d{3} and \d{4} strings, and you do that with a regular set of capturing parentheses, like this: (\d{3})(\d{4})
Since you had only one set of parentheses (that is, you had ([\d{3}][\d{4}])) and it contained exactly two []s, it was putting exactly two characters into $1, and nothing at all into $2. That's why, when you attempted to use $2 in the second half of your s///, it was complaining about an uninitialized value in $2. You were attempting to use a value ($2) that simply wasn't set.
(Also, you were doing two sets of matches: One for the m//, and one for the s///. I simply removed the m// match and kept the s/// match, using its return value to determine if we need to print() anything.)
The second part of the s/// does not use regular expressions, so any [, ], {, }, (, or ) will show up literally as that character. So if you don't want square brackets in the final phone number, don't use them. That's why I used s/.../1-701-$1-$2/; instead of s/.../1-701-[$1]-[$2]/;.
So when you wrote s/([\d{3}][\d{4}])/1-701-[$1]-[$2]/, the ([\d{3}][\d{4}]) part was putting two characters into $1, and nothing into $2. That's why you got a result that contained [77] (which was $1 surrounded by brackets) and [] (which was $2 (an uninitialized value) surrounded by brackets).
As for the second part of your post, I notice that you use a lot of capturing parentheses in your regular expressions, but you never actually use what you capture. That is, you never use $1 (or $2). For example, you write:
if($searchValue =~ m/(\D)/) {
which has m/(\D)/, yet you never use $1 anywhere in that code. I wonder: What's the point of capturing that non-digit character if you don't use it anywhere in your code?
I've seen programmers get confused and mix up the purpose of parentheses and square brackets. When using regular expressions, square brackets ([ and ]) match (not capture) exactly one character. What they match is not put in $1, $2, or any other $n.
Parentheses, on the other hand, capture whatever they match, by setting $1 (or $2, $3, etc.) to what was matched. In general, you shouldn't use parentheses unless you plan on capturing and using that match later. (The main exception to this rule is if you need to group a set of matches, like this: m/I have a (cat|dog|bird)/.)
Many programmers confuse square brackets and parentheses in regular expressions, and try to use them interchangeably. They'll write something like m/I have a [cat|dog|bird]/ and not realize that it's the same as m/I have a [abcdgiort|]/ (which doesn't capture anything, since there are no parentheses), and wonder why their program complains that $1 is an uninitialized value.
This is a common mistake, so don't feel bad if you didn't know the difference. Now you know, and hopefully you can figure out what needs to be corrected in the second part of your code.
I hope this helps.

Regex to upper case not surrounded by single quotes

hello 'this' is my'str'ing
If I have string like this, I'd like to make it all upper case if not surrounded by single quote.
hello 'this' is my'str'ing=>HELLO 'this' IS MY'str'ING
Is there a easy way I can achieve this in node perhaps using regex?
You can use the following regular expression:
'[^']+'|(\w)
Here is a live example:
var subject = "hello 'this' is my'str'ing";
var regex = /'[^']+'|(\w)/g;
replaced = subject.replace(regex, function(m, group1) {
if (!group1) {
return m;
}
else {
return m.toUpperCase();
}
});
document.write(replaced);
Credit of this answer goes to zx81. For more information see the original answer of zx81.
Since Javascript doesn't support lookbehinds, we have to use \B which matches anything a word boundary doesn't match.
In this case, \B' makes sure that ' isn't to the right of anything in \w ([a-zA-Z0-9_]). Likewise, '\B does a similar check to the left.
(?:(.*?)(?=\B'.*?'\B)(?:(\B'.*?'\B))|(.*?)$) (regex demo)
Use a callback function and check to see if the length of captures 1 or 3 is > 0 and if it is, return an uppercase on the match
**The sample uses \U and \L just to uppercase and lowercase the related matches. Your callback need not ever effect $2's case, so "Adam" can stay "Adam", etc.
Unrelated, but a note to anyone who might be trying to do this in reverse. it's much easier to the the REVERSE of this:
(\B'.+?'\B) regex demo

How can I check if a Perl string contains letters?

In Perl, what regex should I use to find if a string of characters has letters or not?
Example of a string used: Thu Jan 1 05:30:00 1970
Would this be fine?
if ($l =~ /[a-zA-Z]/)
{
print "string ";
}
else
{
print "number ";
}
try this:
/[a-zA-Z]/
or
/[[:alpha:]]/
otherwise, you should give examples of the strings you want to match.
also read perldoc perlrequick
Edit: #OP, you have provided example string, but i am not really sure what you want to do with it. so i am assuming you want to check whether a word is all letters, all numbers or something else. here's something to start with. All from perldoc perlrequick (and perlretut) so please read them.
sub check{
my $str = shift;
if ($str =~ /^[a-zA-Z]+$/){
return $str." all letters";
}
if ($str =~ /^[0-9]+$/){
return $str." all numbers";
}else{
return $str." a mix of numbers/letters/others";
}
}
$string = "99932";
print check ($string)."\n";
$string = "abcXXX";
print check ($string)."\n";
$string = "9abd99_32";
print check ($string)."\n";
output
$ perl perl.pl
99932 all numbers
abcXXX all letters
9abd99_32 a mix of numbers/letters/others
If you want to match Unicode characters rather than just ASCII ones, try this:
#!/usr/bin/perl
while (<>) {
if (/[\p{L}]+/) {
print "letters\n";
} else {
print "no letters\n";
}
}
If you're looking for any kind of letter from any language, you should go with
\p{L}
Take a look on this full reference: Unicode Character Properties
Using /[A-Za-z]/ is a US-centric way to do it. To accept any letter, use one of
/[[:alpha:]]/
/\p{L}/
/[^\W\d_]/
The third one employs a double-negative: not not-a-letter, not a digit, and not an underscore.
Whichever you choose, those who maintain your code will certainly appreciate it if you stick with one consistently!
If you're looking to detect whether something looks like a number for the purposes of manipulating it in Perl, you'll want Scalar::Util::looks_like_number (core since perl 5.7.3). From perlapi:
looks_like_number
Test if the content of an SV looks
like a number (or is a number). Inf
and Infinity are treated as numbers
(so will not issue a non-numeric
warning), even if your atof() doesn't
grok them.
[^\W0-9_]
# or
[[:alpha:]]
See perldoc perlre

Is there a way, using regular expressions, to match a pattern for text outside of quotes?

As stated in the title, is there a way, using regular expressions, to match a text pattern for text that appears outside of quotes. Ideally, given the following examples, I would want to be able to match the comma that is outside of the quotes, but not the one in the quotes.
This is some text, followed by "text, in quotes!"
or
This is some text, followed by "text, in quotes" with more "text, in quotes!"
Additionally, it would be nice if the expression would respect nested quotes as in the following example. However, if this is technically not feasible with regular expressions then it wold simply be nice to know if that is the case.
The programmer looked up from his desk, "This can't be good," he exclaimed, "the system is saying 'File not found!'"
I have found some expressions for matching something that would be in the quotes, but nothing quite for something outside of the quotes.
Easiest is matching both commas and quoted strings, and then filtering out the quoted strings.
/"[^"]*"|,/g
If you really can't have the quotes matching, you could do something like this:
/,(?=[^"]*(?:"[^"]*"[^"]*)*\Z)/g
This could become slow, because for each comma, it has to look at the remaining characters and count the number of quotes. \Z matches the end of the string. Similar to $, but will never match line ends.
If you don't mind an extra capture group, it could be done like this instead:
/\G((?:[^"]*"[^"]*")*?[^"]*?)(,)/g
This will only scan the string once. It counts the quotes from the beginning of the string instead. \G will match the position where last match ended.
The last pattern could need an example.
Input String: 'This is, some text, followed by "text, in quotes!" and more ,-as'
Matches:
1. ['This is', ',']
2. [' some text', ',']
3. [' and followed by "text, in quotes!" and more ', ',']
It matches the string leading up to the comma, as well as the comma.
This can be done with modern regexes due to the massive number of hacks to regex engines that exist, but let me be the one to post the "Don't Do This With Regular Expressions" answer.
This is not a job for regular expressions. This is a job for a full-blown parser. As an example of something you can't do with (classical) regular expressions, consider this:
()(())(()())
No (classical) regex can determine if those parenthesis are matched properly, but doing so without a regex is trivial:
/* C code */
char string[] = "()(())(()())";
int parens = 0;
for(char *tmp = string; tmp; tmp++)
{
if(*tmp == '(') parens++;
if(*tmp == ')') parens--;
}
if(parens > 0)
{
printf("%s too many open parenthesis.\n", parens);
}
else if(parens < 0)
{
printf("%s too many closing parenthesis.\n", -parens);
}
else
{
printf("Parenthesis match!\n");
}
# Perl code
my $string = "()(())(()())";
my $parens = 0;
for(split(//, $string)) {
$parens++ if $_ eq "(";
$parens-- if $_ eq ")";
}
die "Too many open parenthesis.\n" if $parens > 0;
die "Too many closing parenthesis.\n" if $parens < 0;
print "Parenthesis match!";
See how simple it was to write some non-regex code to do the job for you?
EDIT: Okay, back from seeing Adventureland. :) Try this (written in Perl, commented to help you understand what I'm doing if you don't know Perl):
# split $string into a list, split on the double quote character
my #temp = split(/"/, $string);
# iterate through a list of the number of elements in our list
for(0 .. $#temp) {
# skip odd-numbered elements - only process $list[0], $list[2], etc.
# the reason is that, if we split on "s, every other element is a string
next if $_ & 1;
if($temp[$_] =~ /regex/) {
# do stuff
}
}
Another way to do it:
my $bool = 0;
my $str;
my $match;
# loop through the characters of a string
for(split(//, $string)) {
if($_ eq '"') {
$bool = !$bool;
if($bool) {
# regex time!
$match += $str =~ /regex/;
$str = "";
}
}
if(!$bool) {
# add the current character to our test string
$str .= $_;
}
}
# get trailing string match
$match += $str =~ /regex/;
(I give two because, in another language, one solution may be easier to implement than the other, not just because There's More Than One Way To Do It™.)
Of course, as your problems grow in complexity, there will arise certain benefits of constructing a full-blown parser, but that's a different horse. For now, this will suffice.
As mentioned before, regexp cannot match any nested pattern, since it is not a Context-free language.
So if you have any nested quotes, you are not going to solve this with a regex.
(Except with the "balancing group" feature of a .Net regex engine - as mentioned by Daniel L in the comments - , but I am not making any assumption of the regex flavor here)
Except if you add further specification, like a quote within a quote must be escaped.
In that case, the following:
text before string "string with \escape quote \" still
within quote" text outside quote "within quote \" still inside" outside "
inside" final outside text
would be matched successfully with:
(?ms)((?:\\(?=")|[^"])+)(?:"((?:[^"]|(?<=\\)")+)(?<!\\)")?
group1: text preceding a quoted text
group2: text within double quotes, even if \" are present in it.
Here is an expression that gets the match, but it isn't perfect, as the first match it gets is the whole string, removing the final ".
[^"].*(,).*[^"]
I have been using my Free RegEx tester to see what works.
Test Results
Group Match Collection # 1
Match # 1
Value: This is some text, followed by "text, in quotes!
Captures: 1
Match # 2
Value: ,
Captures: 1
You should better build yourself a simple parser (pseudo-code):
quoted := False
FOR char IN string DO
IF char = '"'
quoted := !quoted
ELSE
IF char = "," AND !quoted
// not quoted comma found
ENDIF
ENDIF
ENDFOR
This really depends on if you allow nested quotes or not.
In theory, with nested quotes you cannot do this (regular languages can't count)
In practice, you might manage if you can constrain the depth. It will get increasingly ugly as you add complexity. This is often how people get into grief with regular expressions (trying to match something that isn't actually regular in general).
Note that some "regex" libraries/languages have added non-regular features.
If this sort of thing gets complicated enough, you'll really have to write/generate a parser for it.
You need more in your description. Do you want any set of possible quoted strings and non-quoted strings like this ...
Lorem ipsum "dolor sit" amet, "consectetur adipiscing" elit.
... or simply the pattern you asked for? This is pretty close I think ...
(?<outside>.*?)(?<inside>(?=\"))
It does capture the "'s however.
Maybe you could do it in two steps?
First you replace the quoted text:
("[^"]*")
and then you extract what you want from the remaining string
,(?=(?:[^"]*"[^"]*")*[^"]*\z)
Regexes may not be able to count, but they can determine whether there's an odd or even number of something. After finding a comma, the lookahead asserts that, if there are any quotation marks ahead, there's an even number of them, meaning the comma is not inside a set of quotes.
This can be tweaked to handle escaped quotes if needed, though the original question didn't mention that. Also, if your regex flavor supports them, I would add atomic groups or possessive quantifiers to keep backtracking in check.

How to check that a string is a palindrome using regular expressions?

That was an interview question that I was unable to answer:
How to check that a string is a palindrome using regular expressions?
p.s. There is already a question "How to check if the given string is palindrome?" and it gives a lot of answers in different languages, but no answer that uses regular expressions.
The answer to this question is that "it is impossible". More specifically, the interviewer is wondering if you paid attention in your computational theory class.
In your computational theory class you learned about finite state machines. A finite state machine is composed of nodes and edges. Each edge is annotated with a letter from a finite alphabet. One or more nodes are special "accepting" nodes and one node is the "start" node. As each letter is read from a given word we traverse the given edge in the machine. If we end up in an accepting state then we say that the machine "accepts" that word.
A regular expression can always be translated into an equivalent finite state machine. That is, one that accepts and rejects the same words as the regular expression (in the real world, some regexp languages allow for arbitrary functions, these don't count).
It is impossible to build a finite state machine that accepts all palindromes. The proof relies on the facts that we can easily build a string that requires an arbitrarily large number of nodes, namely the string
a^x b a^x (eg., aba, aabaa, aaabaaa, aaaabaaaa, ....)
where a^x is a repeated x times. This requires at least x nodes because, after seeing the 'b' we have to count back x times to make sure it is a palindrome.
Finally, getting back to the original question, you could tell the interviewer that you can write a regular expression that accepts all palindromes that are smaller than some finite fixed length. If there is ever a real-world application that requires identifying palindromes then it will almost certainly not include arbitrarily long ones, thus this answer would show that you can differentiate theoretical impossibilities from real-world applications. Still, the actual regexp would be quite long, much longer than equivalent 4-line program (easy exercise for the reader: write a program that identifies palindromes).
While the PCRE engine does support recursive regular expressions (see the answer by Peter Krauss), you cannot use a regex on the ICU engine (as used, for example, by Apple) to achieve this without extra code. You'll need to do something like this:
This detects any palindrome, but does require a loop (which will be required because regular expressions can't count).
$a = "teststring";
while(length $a > 1)
{
$a =~ /(.)(.*)(.)/;
die "Not a palindrome: $a" unless $1 eq $3;
$a = $2;
}
print "Palindrome";
It's not possible. Palindromes aren't defined by a regular language. (See, I DID learn something in computational theory)
With Perl regex:
/^((.)(?1)\2|.?)$/
Though, as many have pointed out, this can't be considered a regular expression if you want to be strict. Regular expressions does not support recursion.
Here's one to detect 4-letter palindromes (e.g.: deed), for any type of character:
\(.\)\(.\)\2\1
Here's one to detect 5-letter palindromes (e.g.: radar), checking for letters only:
\([a-z]\)\([a-z]\)[a-z]\2\1
So it seems we need a different regex for each possible word length.
This post on a Python mailing list includes some details as to why (Finite State Automata and pumping lemma).
Depending on how confident you are, I'd give this answer:
I wouldn't do it with a regular
expression. It's not an appropriate
use of regular expressions.
Yes, you can do it in .Net!
(?<N>.)+.?(?<-N>\k<N>)+(?(N)(?!))
You can check it here! It's a wonderful post!
StackOverflow is full of answers like "Regular expressions? nope, they don't support it. They can't support it.".
The truth is that regular expressions have nothing to do with regular grammars anymore. Modern regular expressions feature functions such as recursion and balancing groups, and the availability of their implementations is ever growing (see Ruby examples here, for instance). In my opinion, hanging onto old belief that regular expressions in our field are anything but a programming concept is just counterproductive. Instead of hating them for the word choice that is no longer the most appropriate, it is time for us to accept things and move on.
Here's a quote from Larry Wall, the creator of Perl itself:
(…) generally having to do with what we call “regular expressions”, which are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I’m not going to try to fight linguistic necessity here. I will, however, generally call them “regexes” (or “regexen”, when I’m in an Anglo-Saxon mood).
And here's a blog post by one of PHP's core developers:
As the article was quite long, here a summary of the main points:
The “regular expressions” used by programmers have very little in common with the original notion of regularity in the context of formal language theory.
Regular expressions (at least PCRE) can match all context-free languages. As such they can also match well-formed HTML and pretty much all other programming languages.
Regular expressions can match at least some context-sensitive languages.
Matching of regular expressions is NP-complete. As such you can solve any other NP problem using regular expressions.
That being said, you can match palindromes with regexes using this:
^(?'letter'[a-z])+[a-z]?(?:\k'letter'(?'-letter'))+(?(letter)(?!))$
...which obviously has nothing to do with regular grammars.
More info here: http://www.regular-expressions.info/balancing.html
As a few have already said, there's no single regexp that'll detect a general palindrome out of the box, but if you want to detect palindromes up to a certain length, you can use something like
(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1
You can also do it without using recursion:
\A(?:(.)(?=.*?((?(2)\1\2|\1))\z))*?.?\2\z
to allow a single character:
\A(?:(?:(.)(?=.*?((?(2)\1\2|\1))\z))*?.?\2|.)\z
Works with Perl, PCRE
demo
For Java:
\A(?:(.)(?=.*?(\1\2\z|(?<!(?=\2\z).{0,1000})\1\z)))*?.?\2\z
demo
It can be done in Perl now. Using recursive reference:
if($istr =~ /^((\w)(?1)\g{-1}|\w?)$/){
print $istr," is palindrome\n";
}
modified based on the near last part http://perldoc.perl.org/perlretut.html
In ruby you can use named capture groups. so something like this will work -
def palindrome?(string)
$1 if string =~ /\A(?<p>| \w | (?: (?<l>\w) \g<p> \k<l+0> ))\z/x
end
try it, it works...
1.9.2p290 :017 > palindrome?("racecar")
=> "racecar"
1.9.2p290 :018 > palindrome?("kayak")
=> "kayak"
1.9.2p290 :019 > palindrome?("woahitworks!")
=> nil
Recursive Regular Expressions can do it!
So simple and self-evident algorithm to detect a string that contains a palindrome:
(\w)(?:(?R)|\w?)\1
At rexegg.com/regex-recursion the tutorial explains how it works.
It works fine with any language, here an example adapted from the same source (link) as proof-of-concept, using PHP:
$subjects=['dont','o','oo','kook','book','paper','kayak','okonoko','aaaaa','bbbb'];
$pattern='/(\w)(?:(?R)|\w?)\1/';
foreach ($subjects as $sub) {
echo $sub." ".str_repeat('-',15-strlen($sub))."-> ";
if (preg_match($pattern,$sub,$m))
echo $m[0].(($m[0]==$sub)? "! a palindrome!\n": "\n");
else
echo "sorry, no match\n";
}
outputs
dont ------------> sorry, no match
o ---------------> sorry, no match
oo --------------> oo! a palindrome!
kook ------------> kook! a palindrome!
book ------------> oo
paper -----------> pap
kayak -----------> kayak! a palindrome!
okonoko ---------> okonoko! a palindrome!
aaaaa -----------> aaaaa! a palindrome!
bbbb ------------> bbb
Comparing
The regular expression ^((\w)(?:(?1)|\w?)\2)$ do the same job, but as yes/not instead "contains". PS: it is using a definition where "o" is not a palimbrome, "able-elba" hyphened format is not a palindrome, but "ableelba" is. Naming it definition1. When "o" and "able-elba" are palindrones, naming definition2.
Comparing with another "palindrome regexes",
^((.)(?:(?1)|.?)\2)$ the base-regex above without \w restriction, accepting "able-elba".
^((.)(?1)?\2|.)$ (#LilDevil) Use definition2 (accepts "o" and "able-elba" so differing also in the recognition of "aaaaa" and "bbbb" strings).
^((.)(?1)\2|.?)$ (#Markus) not detected "kook" neither "bbbb"
^((.)(?1)*\2|.?)$ (#Csaba) Use definition2.
NOTE: to compare you can add more words at $subjects and a line for each compared regex,
if (preg_match('/^((.)(?:(?1)|.?)\2)$/',$sub)) echo " ...reg_base($sub)!\n";
if (preg_match('/^((.)(?1)?\2|.)$/',$sub)) echo " ...reg2($sub)!\n";
if (preg_match('/^((.)(?1)\2|.?)$/',$sub)) echo " ...reg3($sub)!\n";
if (preg_match('/^((.)(?1)*\2|.?)$/',$sub)) echo " ...reg4($sub)!\n";
Here's my answer to Regex Golf's 5th level (A man, a plan). It works for up to 7 characters with the browser's Regexp (I'm using Chrome 36.0.1985.143).
^(.)(.)(?:(.).?\3?)?\2\1$
Here's one for up to 9 characters
^(.)(.)(?:(.)(?:(.).?\4?)?\3?)?\2\1$
To increase the max number of characters it'd work for, you'd repeatedly replace .? with (?:(.).?\n?)?.
It's actually easier to do it with string manipulation rather than regular expressions:
bool isPalindrome(String s1)
{
String s2 = s1.reverse;
return s2 == s1;
}
I realize this doesn't really answer the interview question, but you could use it to show how you know a better way of doing a task, and you aren't the typical "person with a hammer, who sees every problem as a nail."
Regarding the PCRE expression (from MizardX):
/^((.)(?1)\2|.?)$/
Have you tested it? On my PHP 5.3 under Win XP Pro it fails on: aaaba
Actually, I modified the expression expression slightly, to read:
/^((.)(?1)*\2|.?)$/
I think what is happening is that while the outer pair of characters are anchored, the remaining inner ones are not. This is not quite the whole answer because while it incorrectly passes on "aaaba" and "aabaacaa", it does fail correctly on "aabaaca".
I wonder whether there a fixup for this, and also,
Does the Perl example (by JF Sebastian / Zsolt) pass my tests correctly?
Csaba Gabor from Vienna
/\A(?<a>|.|(?:(?<b>.)\g<a>\k<b+0>))\z/
it is valid for Oniguruma engine (which is used in Ruby)
took from Pragmatic Bookshelf
In Perl (see also Zsolt Botykai's answer):
$re = qr/
. # single letter is a palindrome
|
(.) # first letter
(??{ $re })?? # apply recursivly (not interpolated yet)
\1 # last letter
/x;
while(<>) {
chomp;
say if /^$re$/; # print palindromes
}
As pointed out by ZCHudson, determine if something is a palindrome cannot be done with an usual regexp, as the set of palindrome is not a regular language.
I totally disagree with Airsource Ltd when he says that "it's not possibles" is not the kind of answer the interviewer is looking for. During my interview, I come to this kind of question when I face a good candidate, to check if he can find the right argument when we proposed to him to do something wrong. I do not want to hire someone who will try to do something the wrong way if he knows better one.
something you can do with perl: http://www.perlmonks.org/?node_id=577368
I would explain to the interviewer that the language consisting of palindromes is not a regular language but instead context-free.
The regular expression that would match all palindromes would be infinite. Instead I would suggest he restrict himself to either a maximum size of palindromes to accept; or if all palindromes are needed use at minimum some type of NDPA, or just use the simple string reversal/equals technique.
The best you can do with regexes, before you run out of capture groups:
/(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?).?\9\8\7\6\5\4\3\2\1/
This will match all palindromes up to 19 characters in length.
Programatcally solving for all lengths is trivial:
str == str.reverse ? true : false
I don't have the rep to comment inline yet, but the regex provided by MizardX, and modified by Csaba, can be modified further to make it work in PCRE. The only failure I have found is the single-char string, but I can test for that separately.
/^((.)(?1)?\2|.)$/
If you can make it fail on any other strings, please comment.
#!/usr/bin/perl
use strict;
use warnings;
print "Enter your string: ";
chop(my $a = scalar(<STDIN>));
my $m = (length($a)+1)/2;
if( (length($a) % 2 != 0 ) or length($a) > 1 ) {
my $r;
foreach (0 ..($m - 2)){
$r .= "(.)";
}
$r .= ".?";
foreach ( my $i = ($m-1); $i > 0; $i-- ) {
$r .= "\\$i";
}
if ( $a =~ /(.)(.).\2\1/ ){
print "$a is a palindrome\n";
}
else {
print "$a not a palindrome\n";
}
exit(1);
}
print "$a not a palindrome\n";
From automata theory its impossible to match a paliandrome of any lenght ( because that requires infinite amount of memory). But IT IS POSSIBLE to match Paliandromes of Fixed Length.
Say its possible to write a regex that matches all paliandromes of length <= 5 or <= 6 etc, but not >=5 etc where upper bound is unclear
In Ruby you can use \b(?'word'(?'letter'[a-z])\g'word'\k'letter+0'|[a-z])\b to match palindrome words such as a, dad, radar, racecar, and redivider. ps : this regex only matches palindrome words that are an odd number of letters long.
Let's see how this regex matches radar. The word boundary \b matches at the start of the string. The regex engine enters the capturing group "word". [a-z] matches r which is then stored in the stack for the capturing group "letter" at recursion level zero. Now the regex engine enters the first recursion of the group "word". (?'letter'[a-z]) matches and captures a at recursion level one. The regex enters the second recursion of the group "word". (?'letter'[a-z]) captures d at recursion level two. During the next two recursions, the group captures a and r at levels three and four. The fifth recursion fails because there are no characters left in the string for [a-z] to match. The regex engine must backtrack.
The regex engine must now try the second alternative inside the group "word". The second [a-z] in the regex matches the final r in the string. The engine now exits from a successful recursion, going one level back up to the third recursion.
After matching (&word) the engine reaches \k'letter+0'. The backreference fails because the regex engine has already reached the end of the subject string. So it backtracks once more. The second alternative now matches the a. The regex engine exits from the third recursion.
The regex engine has again matched (&word) and needs to attempt the backreference again. The backreference specifies +0 or the present level of recursion, which is 2. At this level, the capturing group matched d. The backreference fails because the next character in the string is r. Backtracking again, the second alternative matches d.
Now, \k'letter+0' matches the second a in the string. That's because the regex engine has arrived back at the first recursion during which the capturing group matched the first a. The regex engine exits the first recursion.
The regex engine is now back outside all recursion. That this level, the capturing group stored r. The backreference can now match the final r in the string. Since the engine is not inside any recursion any more, it proceeds with the remainder of the regex after the group. \b matches at the end of the string. The end of the regex is reached and radar is returned as the overall match.
here is PL/SQL code which tells whether given string is palindrome or not using regular expressions:
create or replace procedure palin_test(palin in varchar2) is
tmp varchar2(100);
i number := 0;
BEGIN
tmp := palin;
for i in 1 .. length(palin)/2 loop
if length(tmp) > 1 then
if regexp_like(tmp,'^(^.).*(\1)$') = true then
tmp := substr(palin,i+1,length(tmp)-2);
else
dbms_output.put_line('not a palindrome');
exit;
end if;
end if;
if i >= length(palin)/2 then
dbms_output.put_line('Yes ! it is a palindrome');
end if;
end loop;
end palin_test;
my $pal='malayalam';
while($pal=~/((.)(.*)\2)/){ #checking palindrome word
$pal=$3;
}
if ($pal=~/^.?$/i){ #matches single letter or no letter
print"palindrome\n";
}
else{
print"not palindrome\n";
}
This regex will detect palindromes up to 22 characters ignoring spaces, tabs, commas, and quotes.
\b(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*\11?[ \t,'"]*\10|\10?)[ \t,'"]*\9|\9?)[ \t,'"]*\8|\8?)[ \t,'"]*\7|\7?)[ \t,'"]*\6|\6?)[ \t,'"]*\5|\5?)[ \t,'"]*\4|\4?)[ \t,'"]*\3|\3?)[ \t,'"]*\2|\2?))?[ \t,'"]*\1\b
Play with it here: https://regexr.com/4tmui
I wrote an explanation of how I got that here: https://medium.com/analytics-vidhya/coding-the-impossible-palindrome-detector-with-a-regular-expressions-cd76bc23b89b
A slight refinement of Airsource Ltd's method, in pseudocode:
WHILE string.length > 1
IF /(.)(.*)\1/ matches string
string = \2
ELSE
REJECT
ACCEPT