Regex to match all word format except one format

Regex to match all word format except one format - regex

Here is string:
$$START$$ should be matching along with $$MIDDLE$$
$$NOTMATCH$$ this should NOT be matching
$$LAST$$ this should be matching
In the above paragraph, I need to build a regex which can match all the Keywords($$[a-zA-Z]$$) except $$NOTMATCH$$
Until now, I have tried (?!\$\$NOTMATCH\$\$)(\$\$([^\$\$]+)\$\$) but It is not properly working and is not considering the $$ symbols in the end of Keyword, demo here.
Any suggestions are welcome.
Thanks in advance

I need to build a regex which can match all the Keywords ($$[a-zA-Z]$$) except $$NOTMATCH$$
You can use negative lookahead in the middle as this:
(?<!\$)\$\$(?!NOTMATCH)[^$\s]+\$\$(?!\$)
RegEx Demo
(?!NOTMATCH) is negative lookahead that will fail the match if we have NOTMATCH between $$ characters.
(?<!\$) is negative lookbehind to ensure we don't have $ before our match.
(?<\$) is negative lookahead to ensure we don't have $ after our match.

Related

PCRE Regex Match /x... but not /y/x

When configuring redirections, it's common to run into multiple pages that include some of the same path strings. We've ran into this instance multiple times where we need to redirect:
https://example.com/x...
But not:
https://example.com/y/x...
To match the /x... we use PCRE regex of:
/x.*
We've been struggling to get the exclude to match correctly; we apologize in advance as our regex is a bit weak, here's our pseudo code:
Match all /x... except /y/x...
Here is what we thought that looked like:
^\/(?!y\/).x.*
In our mind that reads:
Any query starting with /x..., except starting with /y/x...
Thank you in advance, and please feel free to suggest better formatting, we are not stack overflow pros.

Your regex matches from the start of the string a forward slash and then uses a negative lookahead to check what follows is not y/. If that is true, then match any character followed by x and 0+ character. That will match for example //x///
Without taking matching the url part into account, one way could be to use a negative lookahead (?! to check if what is on the right side does not contain /y/x and then match any character:
^(?!.*/y/x).+
Regex demo

You may use a negative lookbehind assertion:
~(?<!/y)/x~
RegEx Demo
(?<!/y) is a negative lookbehind assertnion that will fail the match if /y appears before matching /x.

Negative Lookahead Faults Regex

I have a regular expression:
^\/admin\/(?!(e06772ed-7575-4cd4-8cc6-e99bb49498c5)).*$
My input string:
/admin/e06772ed-7575-4cd4-8cc6-e99bb49498c5
As I understand, negative lookahead should check if a group (e06772ed-7575-4cd4-8cc6-e99bb49498c5) has a match, or am I incorrect?
Since input string has a group match, why does negative lookahead not work? By that I mean that I expect my regex to e06772ed-7575-4cd4-8cc6-e99bb49498c5 to match input string e06772ed-7575-4cd4-8cc6-e99bb49498c5.
Removing negative lookahead makes this regex work correctly.
Tested with regex101.com

The takeway message of this question is: a lookaround matches a position, not a string.
(?!e06772ed-7575-4cd4-8cc6-e99bb49498c5)
will match any position, that is not followed by e06772ed-7575-4cd4-8cc6-e99bb49498c5.
Which means, that:
^\/admin\/(?!(e06772ed-7575-4cd4-8cc6-e99bb49498c5)).*$
will match:
/admin/abc
and even:
/admin/e99bb49498c5
but not:
/admin/e06772ed-7575-4cd4-8cc6-e99bb49498c5/daffdakjf;adjk;af
This is exactly the explanation why there is a match whenever you get rid of the ?!. The string matches exactly.
Next, you can lose the parentheses inside your lookahead, they do not have their usual function of grouping here.

Regex Negative Lookbehind Matches Lookbehind text .NET

Say I have the following strings:
PB-GD2185-11652-MTCH
GD2185-11652-MTCH
KD-GD2185-11652-MTCH
KD-GD2185-11652
I want REGEX.IsMatch to return true if the string has MTCH in it and does not start with PB.
I expected the regex to be the following:
^(?<!PB)\S+(?=MTCH)
but that gives me the following matches:
PB-GD2185-11652-
GD2185-11652-
KD-GD2185-11652-
I do not understand why the negative lookbehind not only doesn't exclude the match but includes the PB characters in the match. The positive lookahead works as expected.
EDIT 1
Let me start with a simpler example. The following regex matches all of the strings as I would expect it to:
\S+
The following regex still matches all of the strings even though I would expect it not to:
\S+(?!MTCH)
The following regex matches all but the final H character on the first three strings:
\S+(?<!MTCH)
From the documentation at regex 101, a lookahead looks for text to the right of the pattern and a lookbehind looks for text to the left of the pattern, so having a lookahead at the beginning of a string does not jive with the documentation.
Edit 2
take another example with the following three strings:
grey
greyhound
hound
the regex:
^(?<!grey)hound
only matches the final hound. whereas the regex:
^(?<!grey)\S+
matches all three.

You need a lookahead: ^(?!PB)\S+(?=MTCH). Using the look-behind means the PB has to come before the first character.

The problem was because of the greediness of \S+. When dealing with lookarounds and greedy quantifiers you can easily match more characters than you expect. One way to deal with this is to insert a negative lookaround in a group with the greedy quantifier to exclude it as a match as stated in this question:
How to non-greedy multiple lookbehind matches
and on this helpful website about greediness in regular expressions:
http://www.rexegg.com/regex-quantifiers.html
Note that this second link has a few other ways to deal with the greediness in various situations.
A good regular expression for this situation is as follows:
^(?<!PB)((?!PB)\S+)(MTCH)

In situations like this it is going to be much clearer to do it logically within the code. So first check if the string matches MTCH and then that it doesn't match ^PB

RegEx lookahead but not immediately following

I am trying to match terms such as the Dutch ge-berg-te. berg is a noun by itself, and ge...te is a circumfix, i.e. geberg does not exist, nor does bergte. gebergte does. What I want is a RegEx that matches berg or gebergte, working with a lookaround. I was thinking this would work
\b(?i)(ge(?=te))?berg(te)?\b
But it doesn't. I am guessing because a lookahead only checks the immediate following characters, and not across characters. Is there any way to match characters with a lookahead withouth the constraint that those characters have to be immediately behind the others?
Valid matches would be:
Berg
berg
Gebergte
gebergte
Invalid matches could be:
Geberg
geberg
Bergte
bergte
ge-/Ge- and -te always have to occur together. Note that I want to try this with a lookahead. I know it can be done simpler, but I want to see if its methodologically possible to do something like this.

Here is one non-lookaround based regex:
\b(berg|gebergte)\b
Use it with i (ignore case) flag. This regex uses alternation and word boundary to search for complete words berg OR gebergte.
RegEx Demo
Lookaround based regex:
(?<=\bge)berg(?=te\b)|\bberg\b
This regex used a lookahead and lookbehind to search for berg preceded by ge and followed by te. Alternatively it matches complete word berg using word boundary asserter \b which is also 0-width asserter like anchors ^ and $.

To generally forbid a sign, you can put the negative lookaround to the beginning of a string and combine it with random number of other signs before the string you want to forbid:
regex: don't match if containing a specific string
^(?!.\*720).*
This will not match, if the string contains 720, but else match everything else.

Regular expression for prefix exclusion

I am trying to extract gmail.com from a passage where I want only those string match that don't start with #.
Example: abc#gmail.com (don't match this); www.gmail.com (match this)
I tried the following: (?!#)gmail\.com but this did not work. This is matching both the cases highlighted in the example above. Any suggestions?

You want a negative lookbehind if your regex supports it, like (?<!#)gmail\.com and add \bs to avoid matching foogmail.comz, like: (?<!#)\bgmail\.com\b

[^#\s]*(?<!#)\bgmail\.com\b
assuming you want to find strings in a longer text body, not validate entire strings.
Explanation:
[^#\s]* # match any number of non-#, non-space characters
(?<!#) # assert that the previous character isn't an #
\b # match a word boundary (so we don't match hogmail.com)
gmail\.com # match gmail.com
\b # match a word boundary
On a first glance, the (?<!#) lookbehind assertion appears unnecessary, but it isn't - otherwise the gmail.com part of abc#gmail.com would match.

Use this regular expression using negative lookbehind:
/^.*?(?<!#)gmail\.com$/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to match all word format except one format - regex

Related

PCRE Regex Match /x... but not /y/x

Negative Lookahead Faults Regex

Regex Negative Lookbehind Matches Lookbehind text .NET

RegEx lookahead but not immediately following

Regular expression for prefix exclusion

Categories

Resources