I need to exclude word from regular expression - regex

I have this regexp:
^[a-z0-9]+([.\-][a-z0-9]+)*$
I need exclude from match only one word "www".
I tried the negative lookahead but without a success.

Use a negative lookahead like this:
^(?!www$)[a-z0-9]+([.-][a-z0-9]+)*$
^^^^^^^^
This will not match a string equal to www.
See the regex demo
If you want to fail a match with strings that contain -www- or .www., use
^(?!.*\bwww\b)[a-z0-9]+([.-][a-z0-9]+)*$
See another regex demo. This pattern contains a (?!.*\bwww\b) lookahead that fails the whole match if there is a www somewhere inside the string and it has no digits or letters round it due to \b word boundaries.

Related

Regex filtering and excluding forbidden words

Trying to fulfill these requirements:
Alphanumeric allowed [a-zA-z0-9] or \w+
Only numbers NOT allowed
At least 8 characters \S{8,}
Forbidden words: Test, pimba, vraw ^(!?.*Test|pimba|vraw).*$ or \b(?:(?!word)\w)+\b
The problem is I can't mix it all together.
Documentation read: Mozila - Character Classes, Group and Ranges,
indicative Regex,
I'm using https://regex101.com/ to try the regex validation.
Tries:
\b(?:(?!word)\w)+\b(\S{8,})
^(?=\S*\w+)(\S{8,})\b$
^(?!.pimba|vraw|\d{8}).$
^(?=\S*\w+)(\S{8,})+(!?.*Test)$
You may use this regex:
^(?!\d+$)(?!.*(?:Test|pimba|vraw))\w{8,}$
RegEx Demo
RegEx Details:
^: Start
(?!\d+$): Negative lookahead to fail the match if we have all digits
(?!.*(?:Test|pimba|vraw)): Negative lookahead to fail the match if any of those substrings appear anywhere in input
\w{8,}: Match 8 or more word characters
$: End

Regular expression to search for specific Referer in HTTP Header

I need to create a regular expression to match everything except a specific URL for a given Referer. I currently have it to match but can't reverse it and create the negative for it.
What I currently have:
Referer:(http(s)?(:\/\/))?(www\.)?test.com(\/.*)?
In the list below:
Referer:http://www.test.online/
Referer:https://www.test.online/
Referer:https://www.test.tv/
Referer:https://www.blah.com/
Referer:https://www.test.com/
Referer:http://www.test.com/
Referer:http://test.com/
Referer:https://test.com/
It will match:
Referer:https://www.test.com/
Referer:http://www.test.com/
Referer:http://test.com/
Referer:https://test.com/
However, I would like it to match everything except for those.
This is for our WAF so unfortunately are restricted on the usage which can only be fulfilled searching for the HTTP Header being passed back.
Try this regex:
^(?!.*Referer:(http(s)?(:\/\/))?(www\.)?test.com(\/.*)?).*$
A good way to negate your regex is to use negative lookahead.
Explanation:
The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead [is any regex pattern].
Working example: https://regex101.com/r/QJfeBB/1
You could use an anchor ^ to assert the start of the string and use a negative lookahead to assert what is on the right is not what you want to match.
Note that you have to escape the dot to match it literally and you could omit the last part (\/.*)?.
If you don't use the capturing groups for later use you might also turn those into non capturing groups (?:) instead.
^(?!Referer:(https?(:\/\/))?(www\.)?test\.com).+$
regex101 demo
About the pattern
^ Start of the string
(?! Negative lookahead to assert what is on the right does not match
Referer:(https?(:\/\/))?(www\.)?test\.com Match your pattern
) Close negative lookahead
.+ Match any char except a newline 1+ times
$ Assert end of the string

Negative lookahead with capturing groups

I'm attempting this challenge:
https://regex.alf.nu/4
I want to match all strings that don't contain an ABBA pattern.
Match:
aesthophysiology
amphimictical
baruria
calomorphic
Don't Match
anallagmatic
bassarisk
chorioallantois
coccomyces
abba
Firstly, I have a regex to determine the ABBA pattern.
(\w)(\w)\2\1
Next I want to match strings that don't contain that pattern:
^((?!(\w)(\w)\2\1).)*$
However this matches everything.
If I simplify this by specifying a literal for the negative lookahead:
^((?!agm).)*$
The the regex does not match the string "anallagmatic", which is the desired behaviour.
So it looks like the issue is with me using capturing groups and back-references within the negative lookahead.
^(?!.*(.)(.)\2\1).+$
^^
You can use a lookahead here.See demo.The lookahead you created was correct but you need add .* so that it cannot appear anywhere in the string.
https://regex101.com/r/vV1wW6/39
Your approach will also work if you make the first group non capturing.
^(?:(?!(\w)(\w)\2\1).)*$
^^
See demo.It was not working because \2 \1 were different than what you intended.In your regex they should have been \3 and \2.
https://regex101.com/r/vV1wW6/40

RegEx lookahead but not immediately following

I am trying to match terms such as the Dutch ge-berg-te. berg is a noun by itself, and ge...te is a circumfix, i.e. geberg does not exist, nor does bergte. gebergte does. What I want is a RegEx that matches berg or gebergte, working with a lookaround. I was thinking this would work
\b(?i)(ge(?=te))?berg(te)?\b
But it doesn't. I am guessing because a lookahead only checks the immediate following characters, and not across characters. Is there any way to match characters with a lookahead withouth the constraint that those characters have to be immediately behind the others?
Valid matches would be:
Berg
berg
Gebergte
gebergte
Invalid matches could be:
Geberg
geberg
Bergte
bergte
ge-/Ge- and -te always have to occur together. Note that I want to try this with a lookahead. I know it can be done simpler, but I want to see if its methodologically possible to do something like this.
Here is one non-lookaround based regex:
\b(berg|gebergte)\b
Use it with i (ignore case) flag. This regex uses alternation and word boundary to search for complete words berg OR gebergte.
RegEx Demo
Lookaround based regex:
(?<=\bge)berg(?=te\b)|\bberg\b
This regex used a lookahead and lookbehind to search for berg preceded by ge and followed by te. Alternatively it matches complete word berg using word boundary asserter \b which is also 0-width asserter like anchors ^ and $.
To generally forbid a sign, you can put the negative lookaround to the beginning of a string and combine it with random number of other signs before the string you want to forbid:
regex: don't match if containing a specific string
^(?!.\*720).*
This will not match, if the string contains 720, but else match everything else.

Regular expression for prefix exclusion

I am trying to extract gmail.com from a passage where I want only those string match that don't start with #.
Example: abc#gmail.com (don't match this); www.gmail.com (match this)
I tried the following: (?!#)gmail\.com but this did not work. This is matching both the cases highlighted in the example above. Any suggestions?
You want a negative lookbehind if your regex supports it, like (?<!#)gmail\.com and add \bs to avoid matching foogmail.comz, like: (?<!#)\bgmail\.com\b
[^#\s]*(?<!#)\bgmail\.com\b
assuming you want to find strings in a longer text body, not validate entire strings.
Explanation:
[^#\s]* # match any number of non-#, non-space characters
(?<!#) # assert that the previous character isn't an #
\b # match a word boundary (so we don't match hogmail.com)
gmail\.com # match gmail.com
\b # match a word boundary
On a first glance, the (?<!#) lookbehind assertion appears unnecessary, but it isn't - otherwise the gmail.com part of abc#gmail.com would match.
Use this regular expression using negative lookbehind:
/^.*?(?<!#)gmail\.com$/