Negative Lookahead Faults Regex - regex

I have a regular expression:
^\/admin\/(?!(e06772ed-7575-4cd4-8cc6-e99bb49498c5)).*$
My input string:
/admin/e06772ed-7575-4cd4-8cc6-e99bb49498c5
As I understand, negative lookahead should check if a group (e06772ed-7575-4cd4-8cc6-e99bb49498c5) has a match, or am I incorrect?
Since input string has a group match, why does negative lookahead not work? By that I mean that I expect my regex to e06772ed-7575-4cd4-8cc6-e99bb49498c5 to match input string e06772ed-7575-4cd4-8cc6-e99bb49498c5.
Removing negative lookahead makes this regex work correctly.
Tested with regex101.com

The takeway message of this question is: a lookaround matches a position, not a string.
(?!e06772ed-7575-4cd4-8cc6-e99bb49498c5)
will match any position, that is not followed by e06772ed-7575-4cd4-8cc6-e99bb49498c5.
Which means, that:
^\/admin\/(?!(e06772ed-7575-4cd4-8cc6-e99bb49498c5)).*$
will match:
/admin/abc
and even:
/admin/e99bb49498c5
but not:
/admin/e06772ed-7575-4cd4-8cc6-e99bb49498c5/daffdakjf;adjk;af
This is exactly the explanation why there is a match whenever you get rid of the ?!. The string matches exactly.
Next, you can lose the parentheses inside your lookahead, they do not have their usual function of grouping here.

Related

How to match the closest pattern on a capture group excluding overlap? [duplicate]

Given an input string fooxxxxxxfooxxxboo I am trying to write a regex that matches fooxxxboo i.e. starting from the second foo till the last boo.
I tried the following
foo.*?boo matches the complete string fooxxxxxxfooxxxboo
foo.*boo also matches the complete string fooxxxxxxfooxxxboo
I read this Greedy vs. Reluctant vs. Possessive Quantifiers and I understand their difference, but I am trying to match the shortest string from the end which matches the regex i.e. something like the regex to be evaluated from back.
Is there any way I can match only the last portion?
Use negative lookahead assertion.
foo(?:(?!foo).)*?boo
DEMO
(?:(?!foo).)*? - Non-greedy match of any character but not of foo zero or more times. That is, before matching each character, it would check that the character is not the letter f followed by two o's. If yes, then only the corresponding character will be matched.
Why the regex foo.*?boo matches the complete string fooxxxxxxfooxxxboo?
Because the first foo in your regex matches both the foo strings and the following .*? will do a non-greedy match upto the string boo, so we got two matches fooxxxxxxfooxxxboo and fooxxxboo. Because the second match present within the first match, regex engine displays only the first.
.*(foo.*?boo)
Try this. Grab the capture i.e $1 or \1.
See demo.
https://regex101.com/r/nL5yL3/9

Regular expression to search for specific Referer in HTTP Header

I need to create a regular expression to match everything except a specific URL for a given Referer. I currently have it to match but can't reverse it and create the negative for it.
What I currently have:
Referer:(http(s)?(:\/\/))?(www\.)?test.com(\/.*)?
In the list below:
Referer:http://www.test.online/
Referer:https://www.test.online/
Referer:https://www.test.tv/
Referer:https://www.blah.com/
Referer:https://www.test.com/
Referer:http://www.test.com/
Referer:http://test.com/
Referer:https://test.com/
It will match:
Referer:https://www.test.com/
Referer:http://www.test.com/
Referer:http://test.com/
Referer:https://test.com/
However, I would like it to match everything except for those.
This is for our WAF so unfortunately are restricted on the usage which can only be fulfilled searching for the HTTP Header being passed back.
Try this regex:
^(?!.*Referer:(http(s)?(:\/\/))?(www\.)?test.com(\/.*)?).*$
A good way to negate your regex is to use negative lookahead.
Explanation:
The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead [is any regex pattern].
Working example: https://regex101.com/r/QJfeBB/1
You could use an anchor ^ to assert the start of the string and use a negative lookahead to assert what is on the right is not what you want to match.
Note that you have to escape the dot to match it literally and you could omit the last part (\/.*)?.
If you don't use the capturing groups for later use you might also turn those into non capturing groups (?:) instead.
^(?!Referer:(https?(:\/\/))?(www\.)?test\.com).+$
regex101 demo
About the pattern
^ Start of the string
(?! Negative lookahead to assert what is on the right does not match
Referer:(https?(:\/\/))?(www\.)?test\.com Match your pattern
) Close negative lookahead
.+ Match any char except a newline 1+ times
$ Assert end of the string

How to match unspecified number of digits between negative lookbehind and positive lookahead

I am trying to create a regex which matches the digits in the following Japanese strings,
4日
12日
while ignoring the following strings completely.
3月01日
3月1日
3月31日
So far, the closest I have been able to get is by using:
(?<!月)([0-9]{1,2})(?=日)
but this ends up matching the "1" contained in 3月01日 and 3月31日.
Any suggestions?
Add a digit pattern to the lookbehind:
(?<![0-9月])([0-9]{1,2})(?=日)
^^^
See the regex demo
The (?<![0-9月]) lookbehind will fail all the matches when the current position is preceded with a digit or 月 and backtracking won't return the partial numbers in the unwanted context.

Regex Negative Lookbehind Matches Lookbehind text .NET

Say I have the following strings:
PB-GD2185-11652-MTCH
GD2185-11652-MTCH
KD-GD2185-11652-MTCH
KD-GD2185-11652
I want REGEX.IsMatch to return true if the string has MTCH in it and does not start with PB.
I expected the regex to be the following:
^(?<!PB)\S+(?=MTCH)
but that gives me the following matches:
PB-GD2185-11652-
GD2185-11652-
KD-GD2185-11652-
I do not understand why the negative lookbehind not only doesn't exclude the match but includes the PB characters in the match. The positive lookahead works as expected.
EDIT 1
Let me start with a simpler example. The following regex matches all of the strings as I would expect it to:
\S+
The following regex still matches all of the strings even though I would expect it not to:
\S+(?!MTCH)
The following regex matches all but the final H character on the first three strings:
\S+(?<!MTCH)
From the documentation at regex 101, a lookahead looks for text to the right of the pattern and a lookbehind looks for text to the left of the pattern, so having a lookahead at the beginning of a string does not jive with the documentation.
Edit 2
take another example with the following three strings:
grey
greyhound
hound
the regex:
^(?<!grey)hound
only matches the final hound. whereas the regex:
^(?<!grey)\S+
matches all three.
You need a lookahead: ^(?!PB)\S+(?=MTCH). Using the look-behind means the PB has to come before the first character.
The problem was because of the greediness of \S+. When dealing with lookarounds and greedy quantifiers you can easily match more characters than you expect. One way to deal with this is to insert a negative lookaround in a group with the greedy quantifier to exclude it as a match as stated in this question:
How to non-greedy multiple lookbehind matches
and on this helpful website about greediness in regular expressions:
http://www.rexegg.com/regex-quantifiers.html
Note that this second link has a few other ways to deal with the greediness in various situations.
A good regular expression for this situation is as follows:
^(?<!PB)((?!PB)\S+)(MTCH)
In situations like this it is going to be much clearer to do it logically within the code. So first check if the string matches MTCH and then that it doesn't match ^PB

Shortest match in regex from end

Given an input string fooxxxxxxfooxxxboo I am trying to write a regex that matches fooxxxboo i.e. starting from the second foo till the last boo.
I tried the following
foo.*?boo matches the complete string fooxxxxxxfooxxxboo
foo.*boo also matches the complete string fooxxxxxxfooxxxboo
I read this Greedy vs. Reluctant vs. Possessive Quantifiers and I understand their difference, but I am trying to match the shortest string from the end which matches the regex i.e. something like the regex to be evaluated from back.
Is there any way I can match only the last portion?
Use negative lookahead assertion.
foo(?:(?!foo).)*?boo
DEMO
(?:(?!foo).)*? - Non-greedy match of any character but not of foo zero or more times. That is, before matching each character, it would check that the character is not the letter f followed by two o's. If yes, then only the corresponding character will be matched.
Why the regex foo.*?boo matches the complete string fooxxxxxxfooxxxboo?
Because the first foo in your regex matches both the foo strings and the following .*? will do a non-greedy match upto the string boo, so we got two matches fooxxxxxxfooxxxboo and fooxxxboo. Because the second match present within the first match, regex engine displays only the first.
.*(foo.*?boo)
Try this. Grab the capture i.e $1 or \1.
See demo.
https://regex101.com/r/nL5yL3/9