Regex to check that a character in range doesn't repeat - regex

I want to match against Strings such as AhKs & AdKs (i.e. two cards Ah = Ace of Hearts). I want to match two off-suit cards with a regex, what I currently have is "^[AKQJT2-9][hscd]{2}$", but this could match hands such as AhKh (suited) and AhAh. Is there a way to possibly use backreferences to say the second [hscd] cannot be the same as the firs (similarly for [AKQJT2-9])

Not perfectly elegant, but works:
^[AKQJT2-9]([hscd])[AKQJT2-9](?!\1)[hscd]$

Try this regular expression:
^[AKQJT2-9]([hscd])[AKQJT2-9](?!\1)[hscd]$
Here a negative look-ahead assertion (?!…) is used to disallow the fourth character to be the same as the second (match of first grouping).
But if the regular expression implementation does not support look-around assertions, you will probably need to expand it to this:
^[AKQJT2-9](h[AKQJT2-9][scd]|s[AKQJT2-9][hcd]|c[AKQJT2-9][hsd]|d[AKQJT2-9][hsc])$

a negative lookahead comes to the rescue
/^[AKQJT2-9]([hscd])[AKQJT2-9](?!\1)[hscd]$/
:( too late.

Yes. Use back-reference together with a negative look-ahead.
^([AKQJT2-9])([hscd])(?!\1)(?!.\2)[AKQJT2-9][hscd]$

Related

Non Greedy Regex from Left

I have string like this:
\24s904dS\24sr4d2\24x\\y\\12z:234F\\3dRl\24o980\24
I want to match the bold part only:
x\\y\\12z:234F\\3dRl
I can take care the non-greedy for right part with this regex:
\\24(.*:.*?)\\24
But still can't find out how to deal with non-greedy for left part.
modify your pattern as follows
.*\\24(.*:.*?)\\24
You can use this negative lookahead based regex:
\\24((?:.(?!\\24))*:.*?)\\24
RegEx Demo
Important part is this lookahead based regex pattern (?:.(?!\\24))*, which means match a character if \24 is not followed. That essentially makes sure most adjacent left \24 is matched.
Output Match:
x\\y\\12z:234F\\3dRl
Rather than modifying the greediness, it's better to just write a more-precise regex:
\\24([a-zA-Z0-9]+:[a-zA-Z0-9]+)\\24
(It's relatively rare that non-greedy modifiers are really the best approach to a problem.)

regular expression - excluding specific chars

I am trying to write a regular expression matching a set without some chars.
For example, it matches [ a-zA-Z]* but excludes i,o,q,I,O,Q.
So: "A fat cat" matches, "Boy" doesn't.
Looks like it can be [ a-hj-npr-zA-HJ-NPR-Z]*.
Is there a simpler version for this?
Btw, I'm using it in PostgreSQL, but I think it should be a standard expression.
You can use negative lookahead for this as Postgresql support lookaheads:
(?![ioqIOQ])[A-Za-z ]
To make it match complete line use:
^(?:(?![ioqIOQ])[A-Za-z ])+$
RegEx Demo
Based on #Anubhava's answer, but extending to an entire string rather than just one character,
^(?=[^ioqIOQ]*$)[ A-Za-z]*$
The (?=...) is a positive lookahead -- the opposite of the negative lookahead in Anubhava's answer. We are requiring all matches to also match the constraint [^ioqIOQ].
You could also implement the repetition over the entire string with
^((?![ioqIOQ])[ A-Za-z])*$
but it seems a lot less efficient. (I have not performed any timings, though.)
Don't need fancy lookaheads/behinds just use more, but smaller, character ranges.
You'll want something like ^[a-hj-npr-zA-HJ-NPR-Z ]*$.
Added a space to match sentences
You can see test this on-line here at debuggex

regex negative look-ahead for exactly 3 capital letters arround a char

im trying to write a regex finds all the characters that have
exactly 3 capital letters on both their sides
The following regex finds all the characters that have exactly 3 capital letters on the left side of the char, and 3 (or more) on the right:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3})'
When trying to limit the right side to no more then 3 capitals using the regex:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3})(?![A-Z])'
i get no results, there seems to be a fail when adding the (?![A-Z]) to the first regex.
can someone explain me the problem and suggest a way to solve it?
Thanks.
You need to put the negative lookahead inside the positive one:
(?<![A-Z])[A-Z]{3}.(?=[A-Z]{3}(?![A-Z]))
You can do that with the lookbehind, too:
(?<=(?<![A-Z])[A-Z]{3}).(?=[A-Z]{3}(?![A-Z]))
It doesn't violate the "fixed-length lookbehind" rule because lookarounds themselves don't consume any characters.
EDIT (about fixed-length lookbehind): Of all the flavors that support lookbehind, Python is the most inflexible. In most flavors (e.g. Perl, PHP, Ruby 1.9+) you could use:
(?<=^[A-Z]{3}|[^A-Z][A-Z]{3}).
...to match a character preceded by exactly three uppercase ASCII letters. The first alternative - ^[A-Z]{3} - starts looking three positions back, while the second - [^A-Z][A-Z]{3} - goes back exactly four positions. In Java, you can reduce that to:
(?<=(^|[^A-Z])[A-Z]{3}).
...because it does a little extra work at compile time to figure out that the maximum lookbehind length will be four positions. And in .NET and JGSoft, anything goes; if it's legal anywhere, it's legal in a lookbehind.
But in Python, a lookbehind subexpression has to match a single, fixed number of characters. If you've butted your head against that limitation a few times, you might not expect something like this to work:
(?<=(?<![A-Z])[A-Z]{3}).
At least I didn't. It's even more concise than the Java version; how can it work in Python? But it does work, in Python and in every other flavor that supports lookbehind.
And no, there are no similar restrictions on lookaheads, in any flavor.
Taking out the positive lookahead worked for me.
(?<![A-Z])[A-Z]{3}(.)([A-Z]{3})(?![A-Z])
'ABCdDEF' 'ABCfDEF' 'HHHhhhHHHH' 'jjJJjjJJJ' JJJjJJJ
matches
ABCdDEF
ABCfDEF
JJJjJJJ
I'm not sure how the regexp engines should work with multiple lookahead assertions, but the one you're using may have its own opinion on that.
You could as well use a single assertion as follows:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3}[^A-Z])'
The same with lookbehind:
'(?<=[^A-Z][A-Z]{3})(.)(?=[A-Z]{3}[^A-Z])'
This will have a problem matching the pattern in the beginning and in the end of the line.
I can't think of a proper solution, but there can be a dirty trick: for instance, add a space (or something else) in the beginning and the end of the whole line, then perform the matching.
$ echo 'ABCdDEF ABCfDEF HHHhhhHHHH AAAaAAAbAAA jjJJJJjJJJ JJJjJJJ' | sed 's/.*/ & /' | grep -oP '(?<=[^A-Z][A-Z]{3})(\S)(?=[A-Z]{3}[^A-Z])'
d
f
a
b
j
Note that I changed (.) to (\S) in the middle, change it back if you want the space to match.
P.S. Are you solving The Python Challenge? :)
Since the look ahead pattern is the same as the look behind pattern, you could also use the continue anchor \G:
/(?:[A-Z]{3}|\G[A-Z]*)(.)[A-Z]{3}/
A match is returned if three capitals precede a single character or where the last match left off (optionally followed by other capitals).

Regular Expression Postive Lookahead substring

I am fairly new to regular expressions and the more and more I use them, the more I like them. I am working on a regular expression that must meet the following conditions:
Must start with an Alpha character
Out of the next three characters, at least one must be an Alpha character.
Anything after the first four characters is an automatic match.
I currently have the following regex: ^[a-zA-Z](?=.*[a-zA-Z]).{1}.*$
The issue I am running into is that my positive lookahead (?=.*[a-zA-Z]).{1} is not constrained to the next three characters following the alpha character.
I feel as if I am missing a concept here. What am I missing from this expression?
Thanks all.
The .* in your lookahead is doing that. You should limit the range here like
^[a-zA-Z](?=.{0,2}[a-zA-Z]).{1}.*$
Edit: If you want to make sure, that there are a least 4 characters in the string, you could use another lookahead like this:
^[a-zA-Z](?=.{3})(?=.{0,2}[a-zA-Z]).{1}.*$
What do you want lookahead for? Why not just use
^[a-zA-Z](..[a-zA-Z]|.[a-zA-Z].|[a-zA-Z]..)
and be happy?
You'll probably have to do a workaround. Something like:
^[a-z](?=([a-z]..|.[a-z].|..[a-z])).{3}.*
First char [a-z]
Positive lookahead, either first, or second, or third char is a-z ([a-z]..|.[a-z].|..[a-z])
Other stuff
Change the * in your lookahead to ? to get m/^[a-zA-Z](?=.?[a-zA-Z]).{1}.*$
If I am understanding your criteria, that fixes it because of the change in greediness.
These are correctly matched:
a2a3-match
2aaa-no match
Aaaa-match
a333-no match

How can I "inverse match" with regex?

I'm processing a file, line-by-line, and I'd like to do an inverse match. For instance, I want to match lines where there is a string of six letters, but only if these six letters are not 'Andrea'. How should I do that?
I'm using RegexBuddy, but still having trouble.
(?!Andrea).{6}
Assuming your regexp engine supports negative lookaheads...
...or maybe you'd prefer to use [A-Za-z]{6} in place of .{6}
Note that lookaheads and lookbehinds are generally not the right way to "inverse" a regular expression match. Regexps aren't really set up for doing negative matching; they leave that to whatever language you are using them with.
For Python/Java,
^(.(?!(some text)))*$
http://www.lisnichenko.com/articles/javapython-inverse-regex.html
In PCRE and similar variants, you can actually create a regex that matches any line not containing a value:
^(?:(?!Andrea).)*$
This is called a tempered greedy token. The downside is that it doesn't perform well.
The capabilities and syntax of the regex implementation matter.
You could use look-ahead. Using Python as an example,
import re
not_andrea = re.compile('(?!Andrea)\w{6}', re.IGNORECASE)
To break that down:
(?!Andrea) means 'match if the next 6 characters are not "Andrea"'; if so then
\w means a "word character" - alphanumeric characters. This is equivalent to the class [a-zA-Z0-9_]
\w{6} means exactly six word characters.
re.IGNORECASE means that you will exclude "Andrea", "andrea", "ANDREA" ...
Another way is to use your program logic - use all lines not matching Andrea and put them through a second regex to check for six characters. Or first check for at least six word characters, and then check that it does not match Andrea.
Negative lookahead assertion
(?!Andrea)
This is not exactly an inverted match, but it's the best you can directly do with regex. Not all platforms support them though.
If you want to do this in RegexBuddy, there are two ways to get a list of all lines not matching a regex.
On the toolbar on the Test panel, set the test scope to "Line by line". When you do that, an item List All Lines without Matches will appear under the List All button on the same toolbar. (If you don't see the List All button, click the Match button in the main toolbar.)
On the GREP panel, you can turn on the "line-based" and the "invert results" checkboxes to get a list of non-matching lines in the files you're grepping through.
I just came up with this method which may be hardware intensive but it is working:
You can replace all characters which match the regex by an empty string.
This is a oneliner:
notMatched = re.sub(regex, "", string)
I used this because I was forced to use a very complex regex and couldn't figure out how to invert every part of it within a reasonable amount of time.
This will only return you the string result, not any match objects!
(?! is useful in practice. Although strictly speaking, looking ahead is not a regular expression as defined mathematically.
You can write an inverted regular expression manually.
Here is a program to calculate the result automatically.
Its result is machine generated, which is usually much more complex than hand writing one. But the result works.
If you have the possibility to do two regex matches for the inverse and join them together you can use two capturing groups to first capture everything before your regex
^((?!yourRegex).)*
and then capture everything behind your regex
(?<=yourRegex).*
This works for most regexes. One problem I discovered was when I had a quantifier like {2,4} at the end. Then you gotta get creative.
In Perl you can do:
process($line) if ($line =~ !/Andrea/);