Regular expression for prefix exclusion - regex

I am trying to extract gmail.com from a passage where I want only those string match that don't start with #.
Example: abc#gmail.com (don't match this); www.gmail.com (match this)
I tried the following: (?!#)gmail\.com but this did not work. This is matching both the cases highlighted in the example above. Any suggestions?

You want a negative lookbehind if your regex supports it, like (?<!#)gmail\.com and add \bs to avoid matching foogmail.comz, like: (?<!#)\bgmail\.com\b

[^#\s]*(?<!#)\bgmail\.com\b
assuming you want to find strings in a longer text body, not validate entire strings.
Explanation:
[^#\s]* # match any number of non-#, non-space characters
(?<!#) # assert that the previous character isn't an #
\b # match a word boundary (so we don't match hogmail.com)
gmail\.com # match gmail.com
\b # match a word boundary
On a first glance, the (?<!#) lookbehind assertion appears unnecessary, but it isn't - otherwise the gmail.com part of abc#gmail.com would match.

Use this regular expression using negative lookbehind:
/^.*?(?<!#)gmail\.com$/

Related

Regular expression syntax to match first segment only

I have number of URLs where I need to match first segment without "/" with Regex
This segment can be either xx or xx-xx.
I've tried to do it with lookahead and lookbehind but sometimes in the URL I have another 2 letter segment. (/ts/; /ca/) I don't want /ts; /ca/ them to match.
I only want first segment in my Regex. Any suggestions? Thanks.
https://regex101.com/r/Qy3nyI/1
(?<=\/)\w{2}(-\w{2})?(?=\/)
Test urls:
/en/home.aspx
/en-gb/ts/tc/home.aspx
/en-gb/home.aspx
/en-de/home.aspx
/de-de/home.aspx
/en/home.aspx
/en-fb/afspfas.aspx
/en-gb/ts/ca/anotherPage.aspx
Try adding a starting ^ anchor to the initial lookbehind in your current regex pattern:
(?<=^/)\w{2}(-\w{2})?(?=/)
^^ change is here
Updated demo:
Demo
This pattern says to:
(?<=^/) lookbehind and assert that what precedes is a leading /
\w{2}(-\w{2})? then match the country abbreviation text
(?=/) lookahead and assert that what follows is another /

I need to exclude word from regular expression

I have this regexp:
^[a-z0-9]+([.\-][a-z0-9]+)*$
I need exclude from match only one word "www".
I tried the negative lookahead but without a success.
Use a negative lookahead like this:
^(?!www$)[a-z0-9]+([.-][a-z0-9]+)*$
^^^^^^^^
This will not match a string equal to www.
See the regex demo
If you want to fail a match with strings that contain -www- or .www., use
^(?!.*\bwww\b)[a-z0-9]+([.-][a-z0-9]+)*$
See another regex demo. This pattern contains a (?!.*\bwww\b) lookahead that fails the whole match if there is a www somewhere inside the string and it has no digits or letters round it due to \b word boundaries.

RegEx lookahead but not immediately following

I am trying to match terms such as the Dutch ge-berg-te. berg is a noun by itself, and ge...te is a circumfix, i.e. geberg does not exist, nor does bergte. gebergte does. What I want is a RegEx that matches berg or gebergte, working with a lookaround. I was thinking this would work
\b(?i)(ge(?=te))?berg(te)?\b
But it doesn't. I am guessing because a lookahead only checks the immediate following characters, and not across characters. Is there any way to match characters with a lookahead withouth the constraint that those characters have to be immediately behind the others?
Valid matches would be:
Berg
berg
Gebergte
gebergte
Invalid matches could be:
Geberg
geberg
Bergte
bergte
ge-/Ge- and -te always have to occur together. Note that I want to try this with a lookahead. I know it can be done simpler, but I want to see if its methodologically possible to do something like this.
Here is one non-lookaround based regex:
\b(berg|gebergte)\b
Use it with i (ignore case) flag. This regex uses alternation and word boundary to search for complete words berg OR gebergte.
RegEx Demo
Lookaround based regex:
(?<=\bge)berg(?=te\b)|\bberg\b
This regex used a lookahead and lookbehind to search for berg preceded by ge and followed by te. Alternatively it matches complete word berg using word boundary asserter \b which is also 0-width asserter like anchors ^ and $.
To generally forbid a sign, you can put the negative lookaround to the beginning of a string and combine it with random number of other signs before the string you want to forbid:
regex: don't match if containing a specific string
^(?!.\*720).*
This will not match, if the string contains 720, but else match everything else.

What regex matches string within wordbounds, but not next to '#'

I want to find the number of occurrences of a certain string in a text. The string can match the beginning of a sentence or at the end before the '.'. So I thought of:
\bMY_STRING\b
However, i do not want to match parts of an email address. That is, the string should not be next to the # (at-sign, at-symbol, ampersat, apetail, arroba, atmark, at symbol, commercial at, monkey tail or whatever term makes it easier to find this using a search engine).
So, 'example' should not be counted in 'test#example.com'.
What should replace the \b in my expression to match wordbreaks, except at #?
If your regex flavors knows lookbehind assertions (most do, but JavaScript and Ruby 1.8 only support lookahead), you can replace all \bs with this:
(?<!#)\b(?!#)
This matches a word boundary only if it's not before or after a #.
I think you can use the lookbehind and lookahead options in regex:
#\b(?<![#])YOUR_TEXT(?![#])\b)#
Example
You can use:
[^\#]
which means: match any character except #

Regular expression not matching specific string

My use case is as follows: I would like to find all occurrences of something similar to this /name.action, but where the last part is not .action eg:
name.actoin - should match
name.action - should not match
nameaction - should not match
I have this:
/\w+.\w*
to match two words separated by a dot, but I don't know how to add 'and do not match .action'.
Firstly, you need to escape your . character as that's taken as any character in Regex.
Secondly, you need to add in a Match if suffix is not present group - signified by the (?!) syntax.
You may also want to put a circumflex ^ to signify the start of a new line and change your * (any repetitions) to a + (one or more repititions).
^/\w+\.(?!action)\w+ is the finished Regex.
^\w+\.(?!action)\w*
You need to escape the dot character.
\w+\.(?!action).*
Note the trailing .* Not sure what you want to do after the action text.
See also Regular expression to match string not containing a word?
You'll need to use a zero-width negative lookahead assertion. This will let you look ahead in the string, and match based on the negation of a word.
So the regex you'd need (including the escaped . character) would look something like:
/name\.(?!action)/