Regex negation?

Regex negation? - regex

I'm playing Regex Golf (http://regex.alf.nu/) and I'm doing the Abba hole. I have the following regex that matches the wrong side entirely (which is what I was trying to do):
(([\w])([\w])\3\2)
However, I'm trying to negate it now so it matches the other side. I can't seem to figure that part out. I tried:
(?!([\w])([\w])\3\2)
But that didn't work. Any tips from the regex masters?

You can make it much shorter (and get more points) by simply using . and removing unnecessary parens:
^(?!.*(.)(.)\2\1)
It just makes sure that there's no "abba" ("abba" here means 4 letters in that particular order we don't want to match) in any part of the string without having to match the whole word.

Using the explanation here: https://stackoverflow.com/a/406408/584663
I came up with: ^((?!((\w)(\w)\4\3)).)*$

The key here turns out to be the leading caret, ^, and the .*
(?! ...) is a look-ahead construct, and so does not advance the regex processing engine.
/(?! ...)/ on its own will correctly return a negative result for items matching the expression within; but for items which do not match (...) the regex engine continues processing. However if your regex only contains the (?! ) there is nothing left to process, and the regex processing position never advances. (See this great answer).
Apparently since the remaining regex is empty, it matches any zero-width segment of a string, i.e. it matches any string.
[begin SWAG]
With the caret ^ present, the regex engine is able to recognize that you are looking for a real answer and that you do not want it to tell you the string contains zero-width components.
[end SWAG]
Thus it is able to correctly fail to match when the (?! ) succeeds.

Related

Regex substring matching on capture group

I have an advanced regex question (unless I am overthinking this).
With my basic knowledge of Regex, it is trivial to match static capture group further down in the string.
P(.): D:\1
Correctly matches
Pb: Db
Pa: Da
and (correctly) does not match
Pa: D:b
So far so good. However, what I need to capture is a set of [a-z]+ after the P and match the one character. So that these should also match:
Pabc: D:c
Pabc: D:a
Pba: D:b
Pba: D:a
but not
Pabc: D:x
Pba: D:g
I started going down the path of writing separate patterns like so (spaces added around the alternation for clarity):
P(.): D:\1 | P(.)(.): D:(\1|\2) | P(.)(.)(.): D:(\1|\2|\3)
But I cannot make even this clumsy solution work in Javascript Regex.
Is there an elegant, correct way to do this? Can it be done with Javascript's limited engine?

The following regex will do it:
P.*(.).*: D:\1
.*(.).* will match one or more characters, capturing one of them.
If the captured character matches the character after D:, then the regex matches.
If the captured character doesn't match, backtracking will ensure that it tries again with a different captured character, until all combinations have been tried.
See regex101.com for running example.

Why the character ^ is required in an regex ^(?!.*?spam) to filter strings?

I try to filter strings, that don't contain word "spam".
I use the regex from here!
But I can't understand why I need the symbol ^ at the start of expression. I know that it signs the start of regex but I do not understand why it doesn't work without ^ in my case?
UPD. All the answers hereunder are very usefull.
It's completely clear now. Thank you!

The regex (?!.*?spam) matches a position in a string that is not followed by something matching .*?spam.
Every single string has such a position, because if nothing else, the very end of the string is certainly not followed by anything matching .*?spam.
So every single string contains a match for the regex (?!.*?spam).
The anchor ^ in ^(?!.*?spam) restricts the regex, so that it only matches strings where the very beginning of the string isn't followed by anything matching .*?spam — i.e., strings that don't contain spam at all (or anywhere in the first line, at least, depending on whether . matches newlines).

The lookahead is a zero-width assertion (that is, it ensures a position in your string). In your case it is a negative lookahead making sure that not "zero more characters, followed by the word spam" are following. This is true for a couple of positions in your string, see a demo on regex101.com without the anchor.
With the anchor the matching process starts right at the very beginning, so the whole string is analyzed, see the altered demo on regex101.com as well.

Negative lookahead to match server directories not properly working

Given the following 3 example paths representing server paths i am trying to create a skiplist for my FTP client via PCRE regular expressions but can't seem to get the wished result.
/subdir-level-1/subdir-level-2/.../Author1_-_Title1-(1234)-Publisher1
/subdir-level-1/subdir-level-2/.../Author2_-_Title2_(5678)-PUBLiSHER2
/subdir-level-1/subdir-level-2/.../Author3_-_Title3-4951-publisher3
I want to skip all folders (not paths) that do not end with
-Publisher1
I am trying to create a working pattern with the help of this online help and and this regex tester but don't get any further than to this negative lookahead pattern
.*-(?!Publisher1)
But with this pattern all lines match because with all of them the substrings up to the pattern do all not contain the pattern.
/subdir/subdir/.../Author1_-_Title1-(1234) -Publisher1
/subdir/subdir/.../Author2_-_Title2_(5678) -PUBLiSHER2
/subdir/subdir/.../Author3_-_Title3-4951 -publisher3
What is my mistake and how would the correct pattern be just to match only the second and third line as line to be skipped but keep the first line?
EDIT to make it clearer what to highlight and what not.
Everything from the beginning of the path to the last slash must be ignored (allowed).
Everything after the last slash that matches the defined regex must be skipped.
EDIT to present an advanced pattern matching only the red part
[^/]*(?<!-Publisher2)$
Debuggex Demo

The regex which you have used is:
.*-(?!Publisher1)
I will tell you whats the fault in it.
According to this regex it will match those lines which dont have a - followed by Publisher1. Okay, do you notice the - there in between on yur text, yes. between author and title or after title. So all the strings satisfy this condition. Instead if you search with a negative lookahead in such a way that hiphen is with Publisher1 then your match should work.
So you plan on moving the hiphen inside the parenthesis so that it matches and make your regex like this :
^.*(?!-Publisher1)
but this will also not work, because here .* matches everything, so when we do a lookahead, we are not able to find a single character to match . Thus we will use a negative lookbehind. <.
.*(?<!-Publisher1)
what now ? . I have done everything but still I cannot get it to work. why is it so ?
because a negative lookbehind will lookback and tell if it is not followed by -Publisher1.
this is complex, just bear with me :
suppose your string
/subdir/subdir/.../Author1_-_Title1-(1234)-Publisher1
we do a negative lookbehind for -Publisher1. From the postition after 1 . i.e. at the end of the string -Publisher1 is visible when we lookback. BUT our condition is negative lookbehind. So it will move one character left to reach a position where it will no more be able to lookback and say that "Hey I can see -Publisher1 from here" because from here we are able to see "-Publisher" only. Our condtin satisfies but the regex still matches the rest of the string.
So it is essential to bind the lookbehind to the end of the string so that it doesnot move one character to the left to search for its match.
final regex:
.*(?<!-Publisher1)$
demo here : http://regex101.com/r/lE1vW2

This should suit your needs:
^.*(?<!-Publisher1)$
Debuggex Demo

I want to skip all folders that do not end with -Publisher1
You can use this negative lookahead based regex:
^(?!.*?-Publisher1$).+$
Working Demo

You could use the following regex in order to exclude lines containing Publisher1:
^((?!Publisher1).)*$
Online demo: http://regex101.com/r/gD8jK0

How to negate the whole regex?

I have a regex, for example (ma|(t){1}). It matches ma and t and doesn't match bla.
I want to negate the regex, thus it must match bla and not ma and t, by adding something to this regex. I know I can write bla, the actual regex is however more complex.

Use negative lookaround: (?!pattern)
Positive lookarounds can be used to assert that a pattern matches. Negative lookarounds is the opposite: it's used to assert that a pattern DOES NOT match. Some flavor supports assertions; some puts limitations on lookbehind, etc.
Links to regular-expressions.info
Lookahead and Lookbehind Zero-Width Assertions
Flavor comparison
See also
How do I convert CamelCase into human-readable names in Java?
Regex for all strings not containing a string?
A regex to match a substring that isn’t followed by a certain other substring.
More examples
These are attempts to come up with regex solutions to toy problems as exercises; they should be educational if you're trying to learn the various ways you can use lookarounds (nesting them, using them to capture, etc):
codingBat plusOut using regex
codingBat repeatEnd using regex
codingbat wordEnds using regex

Assuming you only want to disallow strings that match the regex completely (i.e., mmbla is okay, but mm isn't), this is what you want:
^(?!(?:m{2}|t)$).*$
(?!(?:m{2}|t)$) is a negative lookahead; it says "starting from the current position, the next few characters are not mm or t, followed by the end of the string." The start anchor (^) at the beginning ensures that the lookahead is applied at the beginning of the string. If that succeeds, the .* goes ahead and consumes the string.
FYI, if you're using Java's matches() method, you don't really need the the ^ and the final $, but they don't do any harm. The $ inside the lookahead is required, though.

\b(?=\w)(?!(ma|(t){1}))\b(\w*)
this is for the given regex.
the \b is to find word boundary.
the positive look ahead (?=\w) is here to avoid spaces.
the negative look ahead over the original regex is to prevent matches of it.
and finally the (\w*) is to catch all the words that are left.
the group that will hold the words is group 3.
the simple (?!pattern) will not work as any sub-string will match
the simple ^(?!(?:m{2}|t)$).*$ will not work as it's granularity is full lines

This regexp math your condition:
^.*(?<!ma|t)$
Look at how it works:
https://regex101.com/r/Ryg2FX/1

Apply this if you use laravel.
Laravel has a not_regex where field under validation must not match the given regular expression; uses the PHP preg_match function internally.
'email' => 'not_regex:/^.+$/i'

Why do I get successful but empty regex matches?

I'm searching the pattern (.*)\\1 on the text blabl with regexec(). I get successful but empty matches in regmatch_t structures. What exactly has been matched?

The regex .* can match successfully a string of zero characters, or the nothing that occurs between adjacent characters.
So your pattern is matching zero characters in the parens, and then matching zero characters immediately following that.
So if your regex was /f(.*)\1/ it would match the string "foo" between the 'f' and the first 'o'.
You might try using .+ instead of .*, as that matches one or more instead of zero or more. (Using .+ you should match the 'oo' in 'foo')

\1 is the backreference typically used for replacement later or when trying to further refine your regex by getting a match within a match. You should just use (.*), this will give you the results you want and will automatically be given the backreference number 1. I'm no regex expert but these are my thoughts based on my limited knowledge.
As an aside, I always revert back to RegexBuddy when trying to see what's really happening.

\1 is the "re-match" instruction. The question is, do you want to re-match immediately (e.g., BLABLA)
/(.+)\1/
or later (e.g., BLAahemBLA)
/(.+).*\1/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex negation? - regex

Using the explanation here: https://stackoverflow.com/a/406408/584663 I came up with: ^((?!((\w)(\w)\4\3)).)*$

Related

Regex substring matching on capture group

Why the character ^ is required in an regex ^(?!.*?spam) to filter strings?

Negative lookahead to match server directories not properly working

How to negate the whole regex?

Why do I get successful but empty regex matches?

Categories

Resources