A simple Perl regex guaranteed to never match a string? [duplicate] - regex

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
A Regex that will never be matched by anything
I have a script that takes a regex as a parameter. By default I want to set the regex to something that will never match any string, so I can simply say
if ($str =~ $regex)
without e.g. having to check defined($regex) first.
I came up with
qr/[^\s\S]/
but don't know if this will match some utf8 character that is neither a space nor a non-space.

/(?!)/
http://perl.plover.com/yak/regex/samples/slide049.html

Combine a negative lookahead for an arbitrary character followed by a match for that character, e.g.
/(?!x)x/
Works on all the test cases I threw at it. Here are some tests on rubular.

/ ^/ seems to do, and is short(est).

Related

Tricky Regular Expression with a Alphanumeric pattern in uppercase [duplicate]

This question already has answers here:
Can you make just part of a regex case-insensitive?
(5 answers)
Closed 3 years ago.
Okay this might not be tricky at all for some but at the moment really screwing up with my head.
First of all i don't know what engine i am dealing with, but it doesn't seem to identify uppercase.
I have a string for example
Circuit Ref
Service Type
A End Address
Z End Address
52GD J32SD41 O2AE EVC001
Evolve Internet
And I am only trying to extract the string "52GD J32SD41 O2AE EVC001". I have already tried quite a few combinations like
[0-9A-Z]{4}\s[0-9A-Z]+\s[0-9A-Z]+\s[0-9A-Z]+
[A-Z0-9]{4}\s\W+\s\W+\s\W+
[A-Z0-9]{4}\s[A-Z0-9\s]*[A-Z0-9\s]*[A-Z0-9\s]*
Nothing seem to work...I want to keep the expression fairly flexible as the expression can change order of the letters and digits. but the pattern is mostly same. Any nudge in a right direction will be greatly appreciated.
Thanks
This is wild guess, but please try following things:
in front of the regex add (?-i) (Related question, regular-expressions.info, net page about regex)
enclose regex with (?-i: ... )
enclose regex with (?I: ... )
BTW. Regarding 2nd case that you tried: [A-Z0-9]{4}\s\W+\s\W+\s\W+.
Seem that you tried to use \W as "upper case word character", but it is not what it means.
\W means anything that is not \w. That is any non-word character.

RegEx to check if specified word is not in string [duplicate]

This question already has answers here:
How to negate specific word in regex? [duplicate]
(12 answers)
Closed 7 years ago.
I am trying to learn RegEx and build a regular expression that would look whether specified word is NOT in the provided string. So far I did try Regular Expression Info and RexxEgg all this tested on Regular Expression Online but I did not find the answer to my question.
I have tried conditionals and lookarounds. Let's say I want to build an expression to test against not existing word myword and pass expression when the word is NOT in the string. I used expression
(?(?!myword).*)
but RegEx passes regardless the word myword meaning both strings This is the text and This is myword the text pass the test.
Using negative lookahead and conditions is used to test that condition is true when myword does not exist. Lookahead is also zero length and therefore .* would return the whole string.
Hope someone can help :)
^(?(?!\bmyword\b).)*$
You can try this.See demo.Also use \b for matching exactly myword and not mywords
https://regex101.com/r/hI0qP0/7
You should use anchors and negative lookahead:
^(?!.*?myword).*$
(?!.*?myword) is a negative lookahead that will fail the match if myword is found anywhere in the input string.

need a regexp to match positive integers separated by commas [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regular expression to only allow whole numbers and commas in a string
I need a regexp to match a sequence of POSITIVE integers separated by comma. Space is also allowed.
For example
706101, 700102, 700295 should match, but 0, 1, 2, 3 should not.
I tried to use /^\s*(\d+(\s*,\s*\d+)*)?\s*$/ but it seems to accept zeros as well.
Replace (\d+) with [1-9]\d* and it gonna work. For example:
/^\s*[1-9]\d*(?:\s*,\s*[1-9]\d*)*$/
This regex will fail at empty string (while the one in the original post won't), but I assume it's actually the intention. If not, just make the first 'number part' optional.
Something like this would work. The main change is basically switching from \d ([0-9]) to [1-9].
This regex also allows you to type digits such as 0001.
/^(?:0*[1-9]\d*\s*(?:,|$)\s*)+$/gm
As you have not specified language, the flags may change. This is PCRE.
Demo+explanation: http://regex101.com/r/kN2tW0
Try:
[1-9][0-9]+( *, *[1-9][0-9]+)*
Try this
^([1-9]\d*[\s,]*)+$
This RE will completely elliminate strings with standalone '0' at any place in the string.

My regular expression matches too much. How can I tell it to match the smallest possible pattern? [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I have this RegEx:
('.+')
It has to match character literals like in C. For example, if I have 'a' b 'a' it should match the a's and the ''s around them.
However, it also matches the b also (it should not), probably because it is, strictly speaking, also between ''s.
Here is a screenshot of how it goes wrong (I use this for syntax highlighting):
I'm fairly new to regular expressions. How can I tell the regex not to match this?
It is being greedy and matching the first apostrophe and the last one and everything in between.
This should match anything that isn't an apostrophe.
('[^']+')
Another alternative is to try non-greedy matches.
('.+?')
Have you tried a non-greedy version, e.g. ('.+?')?
There are usually two modes of matching (or two sets of quantifiers), maximal (greedy) and minimal (non-greedy). The first will result in the longest possible match, the latter in the shortest. You can read about it (although in perl context) in the Perl Cookbook (Section 6.15).
Try:
('[^']+')
The ^ means include every character except the ones in the square brackets. This way, it won't match 'a' b 'a' because there's a ' in between, so instead it'll give both instances of 'a'
You need to escape the qutoes:
\'[^\']+\'
Edit: Hmm, we'll I suppose this answer depends on what lang/system you're using.

How to exclude a specific string constant? [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 7 years ago.
Can regular expression be utilized to match any string except a specific string constant (i.e. "ABC")?
Is it possible to exclude just one specific string constant?
You have to use a negative lookahead assertion.
(?!^ABC$)
You could for example use the following.
(?!^ABC$)(^.*$)
If this does not work in your editor, try this. It is tested to work in ruby and javascript:
^((?!ABC).)*$
In .NET you can use grouping to your advantage like this:
http://regexhero.net/tester/?id=65b32601-2326-4ece-912b-6dcefd883f31
You'll notice that:
(ABC)|(.)
Will grab everything except ABC in the 2nd group. Parenthesis surround each group. So (ABC) is group 1 and (.) is group 2.
So you just grab the 2nd group like this in a replace:
$2
Or in .NET look at the Groups collection inside the Regex class for a little more control.
You should be able to do something similar in most other regex implementations as well.
UPDATE: I found a much faster way to do this here:
http://regexhero.net/tester/?id=997ce4a2-878c-41f2-9d28-34e0c5080e03
It still uses grouping (I can't find a way that doesn't use grouping). But this method is over 10X faster than the first.
This isn't easy, unless your regexp engine has special support for it. The easiest way would be to use a negative-match option, for example:
$var !~ /^foo$/
or die "too much foo";
If not, you have to do something evil:
$var =~ /^(($)|([^f].*)|(f[^o].*)|(fo[^o].*)|(foo.+))$/
or die "too much foo";
That one basically says "if it starts with non-f, the rest can be anything; if it starts with f, non-o, the rest can be anything; otherwise, if it starts fo, the next character had better not be another o".
Try this regular expression:
^(.{0,2}|([^A]..|A[^B].|AB[^C])|.{4,})$
It describes three cases:
less than three arbitrary character
exactly three characters, while either
the first is not A, or
the first is A but the second is not B, or
the first is A, the second B but the third is not C
more than three arbitrary characters
You could use negative lookahead, or something like this:
^([^A]|A([^B]|B([^C]|$)|$)|$).*$
Maybe it could be simplified a bit.