How to match all strings other than a particular one - regex

To match all characters except vowels, we can use [^aeiou].
I wonder
how to match all strings other than a particular one? For example, I want to match a string which is not dog. So cat, sky, and mike will all be matches.
how to match all strings other than a few strings, or other than a regular expression?
For example, I want to match a string which is not c.t. So sky and mike will all be matches, but cat and cut will not be matches.
Thanks.

1. How to match all strings other than a particular one
^(?!your_string$).*$
2. How to match all strings other than a few strings
^(?!(?:string1|string2|string3)$).*$
How does that work?
The idea is to use a negative lookahead (?! to check that the string does not consists solely of the string(s) to avoid. If the negative lookahead (which is an assertion) succeeds, the .*$ matches everything to the end of the string.
Note the use of the ^ anchor at the beginning to ensure we are positioned at the beginning of the string.
Note the $ anchor inside the negative lookahead to ensure that we are excluding your_string if it is indeed the whole string, but that we do not exclude your_string and more
Reference
Mastering Lookahead and Lookbehind
Negative Lookaheads

Related

RegEx negative lookahead on pattern

I want to find all expressions that don't end with ":"
I tried to do it like that:
[a-z]{2,}(?!:)
On this text:
foobar foobaz:
foobaz
foobaz:
The problem is, that it just takes away the last character befor the ":" and not the whole match.
Here is the example: https://regex101.com/r/jtLRvz/1
How can I get the negative lookahead work for the whole regular expression?
When [a-z]{2,}(?!:) matches baz:, [a-z]{2,} grabs 2 or more lowercase ASCII letters at once (baz) and the negative lookahead (?!:) checks the char immediately to the right. It is :, so the engine asks itself if there is a way to match the string in a different way. Since {2,} can match two chars, not currently matched three, it backtracks, and finds a valid match.
Add a-z to the lookahead pattern to make sure the char right after 2 or more lowercase ASCII letters is not a letter and not a colon:
[a-z]{2,}(?![a-z:])
^^^
See the regex demo
If your regex engine supports possessive modifiers, or atomic groups, you may use them to prevent backtracking into the [a-z]{2,} subpattern:
[a-z]{2,}+(?!:)
(?>[a-z]{2,})(?!:)
See another regex demo.

Regex in middle of text doesn't match

I have a regex to find url's in text:
^(?!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}?$
However it fails when it is surrounded by text:
https://regex101.com/r/0vZy6h/1
I can't seem to grasp why it's not working.
Possible reasons why the pattern does not work:
^ and $ make it match the entire string
(?!:\/\/) is a negative lookahead that fails the match if, immediately to the right of the current location, there is :// substring. But [a-zA-Z0-9-_]+ means there can't be any ://, so, you most probably wanted to fail the match if :// is present to the left of the current location, i.e. you want a negative lookbehind, (?<!:\/\/).
[a-zA-Z]{2,11}? - matches 2 chars only if $ is removed since the {2,11}? is a lazy quantifier and when such a pattern is at the end of the pattern it will always match the minimum char amount, here, 2.
Use
(?<!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}
See the regex demo. Add \b word boundaries if you need to match the substrings as whole words.
Note in Python regex there is no need to escape /, you may replace (?<!:\/\/) with (?<!://).
The spaces are not being matched. Try adding space to the character sets checking for leading or trailing text.

How to only match a single instance of a character?

Not quite sure how to go about this, but basically what I want to do is match a character, say a for example. In this case all of the following would not contain matches (i.e. I don't want to match them):
aa
aaa
fooaaxyz
Whereas the following would:
a (obviously)
fooaxyz (this would only match the letter a part)
My knowledge of RegEx is not great, so I am not even sure if this is possible. Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
^[^\sa]*\Ka(?=[^\sa]*$)
DEMO
\K discards the previously matched characters and lookahead assertes whether a match is possibel or not. So the above matches only the letter a which satifies the conditions.
OR
a{2,}(*SKIP)(*F)|a
DEMO
You may use a combination of a lookbehind and a lookahead:
(?<!a)a(?!a)
See the regex demo and the regex graph:
Details
(?<!a) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a a char
a - an a char
(?!a) - a negative lookahead that fails the match if, immediately to the right of the current location, there is a a char.
You need two things:
a negated character class: [^a] (all except "a")
anchors (^ and $) to ensure that the limits of the string are reached (in other words, that the pattern matches the whole string and not only a substring):
Result:
^[^a]*a[^a]*$
Once you know there is only one "a", you can use the way you want to extract/replace/remove it depending of the language you use.

RegEx lookahead but not immediately following

I am trying to match terms such as the Dutch ge-berg-te. berg is a noun by itself, and ge...te is a circumfix, i.e. geberg does not exist, nor does bergte. gebergte does. What I want is a RegEx that matches berg or gebergte, working with a lookaround. I was thinking this would work
\b(?i)(ge(?=te))?berg(te)?\b
But it doesn't. I am guessing because a lookahead only checks the immediate following characters, and not across characters. Is there any way to match characters with a lookahead withouth the constraint that those characters have to be immediately behind the others?
Valid matches would be:
Berg
berg
Gebergte
gebergte
Invalid matches could be:
Geberg
geberg
Bergte
bergte
ge-/Ge- and -te always have to occur together. Note that I want to try this with a lookahead. I know it can be done simpler, but I want to see if its methodologically possible to do something like this.
Here is one non-lookaround based regex:
\b(berg|gebergte)\b
Use it with i (ignore case) flag. This regex uses alternation and word boundary to search for complete words berg OR gebergte.
RegEx Demo
Lookaround based regex:
(?<=\bge)berg(?=te\b)|\bberg\b
This regex used a lookahead and lookbehind to search for berg preceded by ge and followed by te. Alternatively it matches complete word berg using word boundary asserter \b which is also 0-width asserter like anchors ^ and $.
To generally forbid a sign, you can put the negative lookaround to the beginning of a string and combine it with random number of other signs before the string you want to forbid:
regex: don't match if containing a specific string
^(?!.\*720).*
This will not match, if the string contains 720, but else match everything else.

How to match, but not capture, part of a regex?

I have a list of strings. Some of them are of the form 123-...456. The variable portion "..." may be:
the string "apple" followed by a hyphen, e.g. 123-apple-456
the string "banana" followed by a hyphen, e.g. 123-banana-456
a blank string, e.g. 123-456 (note there's only one hyphen)
Any word other than "apple" or "banana" is invalid.
For these three cases, I would like to match "apple", "banana", and "", respectively. Note that I never want capture the hyphen, but I always want to match it. If the string is not of the form 123-...456 as described above, then there is no match at all.
How do I write a regular expression to do this? Assume I have a flavor that allows lookahead, lookbehind, lookaround, and non-capturing groups.
The key observation here is that when you have either "apple" or "banana", you must also have the trailing hyphen, but you don't want to match it. And when you're matching the blank string, you must not have the trailing hyphen. A regex that encapsulates this assertion will be the right one, I think.
The only way not to capture something is using look-around assertions:
(?<=123-)((apple|banana)(?=-456)|(?=456))
Because even with non-capturing groups (?:…) the whole regular expression captures their matched contents. But this regular expression matches only apple or banana if it’s preceded by 123- and followed by -456, or it matches the empty string if it’s preceded by 123- and followed by 456.
Lookaround
Name
What it Does
(?=foo)
Lookahead
Asserts that what immediately FOLLOWS the current position in the string is foo
(?<=foo)
Lookbehind
Asserts that what immediately PRECEDES the current position in the string is foo
(?!foo)
Negative Lookahead
Asserts that what immediately FOLLOWS the current position in the string is NOT foo
(?<!foo)
Negative Lookbehind
Asserts that what immediately PRECEDES the current position in the string is NOT foo
In javascript try: /123-(apple(?=-)|banana(?=-)|(?!-))-?456/
Remember that the result is in group 1
Debuggex Demo
Based on the input provided by Germán Rodríguez Herrera
Try:
123-(?:(apple|banana|)-|)456
That will match apple, banana, or a blank string, and following it there will be a 0 or 1 hyphens. I was wrong about not having a need for a capturing group. Silly me.
I have modified one of the answers (by #op1ekun):
123-(apple(?=-)|banana(?=-)|(?!-))-?456
The reason is that the answer from #op1ekun also matches "123-apple456", without the hyphen after apple.
Try this:
/\d{3}-(?:(apple|banana)-)?\d{3}/
A variation of the expression by #Gumbo that makes use of \K for resetting match positions to prevent the inclusion of number blocks in the match. Usable in PCRE regex flavours.
123-\K(?:(?:apple|banana)(?=-456)|456\K)
Matches:
Match 1 apple
Match 2 banana
Match 3
By far the simplest (works for python) is '123-(apple|banana)-?456'.