Regex negative lookahead not matching last space - regex

I have text:
test: [ABCD]
test: foobar
test: [ABCD]
And I've wrote this regex:
test:\s+(?!\[ABCD)
So basicaly I want to find any occurence where test: is NOT followed by [ABCD and there can be any amount of whitespace between the two.
So for the first two examples it works as intended but I have problem with the third one: it looks like because of this part: (?!\[ABCD) the \s+ is not matching last space if there are more than one. Why is that and how to solve it? I want to third example bahave just like frist one. Screenshot from regex101 to illustrate the issue:

You get the last match, as \s+ can backtrack one step to make sure the last assertion is true.
There is no language listed, but is possessive quantifiers are supported, you can also use
test:\s++(?!\[ABCD)
See a regex demo.

You need the lookahead before the match with an anchor:
/^(?!test:\s+\[ABCD\]).*/
Demo

You have one good answer to match the entire line if it follows the criteria, but based on this:
I want to find any occurence where "test:" is NOT followed by "[ABCD" and there can be any amount of whitespace between the two.
If you want to only match the "test:" part, you can just move the whitespace character into the negative look-ahead on what you have.
test:(?!\s+\[ABCD)
Screenshot from regex101 using your example phrases:

Related

RegEx for enforcing and finding the last match

I am trying to extract a part of a name from a string. I almost have it, but something isn't right where I am using a positive lookahead.
Here is my regex: (?=s\s(.*?)$)
I have marked all the results I want with bold text.
Trittbergets Ronja
Minitiger's Samanta Junior
Björntorpets Cita
Sors Kelly's Majsskalle
The problem is that Kelly's Majsskalle gets returned, when it should only select Majsskalle.
Here is a link to regex101 for debugging:
https://regex101.com/r/PZWxr7/1
How do I get the lookahead to disregard the first match?
No need for a lookahead. Just try this:
.*s\s(.*?)$
You need to enforce regular expression engine to find the last match using a dot-star:
^.*s\s(.*)$
A .* consumes everything up to a linebreak immediately then engine backtracks to match the next pattern.
See live demo here
or use a tempered dot:
s(?= ((?:(?!s ).)+)$)
^^^^^^^^^^
Match a byte only if we are not pointing at a `s[ ]`
See live demo here
Note: the former is the better solution.
The lookahead should be used to determine the start of a capture or the end of a capture. To start the capture after the first capture, you need to use a lookbehind - this ensures the text BEFORE the capture is that search pattern.
Update your pattern on regex101 to this and you'll see the difference:
(?<=s\s).*?$
Edit - my bad, I didn't spot that last line.
You can also include a negative lookahead to ensure that there's not another word that ends in s in the next match:
(?<=s\s)(?!.+?s\s).*?$
This solves the issue with the last line.

Negative Lookahead: trying to match one word and negate following words

I have a regex like
^.*\bfrost.*(?!flakes|snowman).*$
I am testing it against the following lines:
frosted flakes
frosty snowman
frost, jack
See this Regex.101 demo.
I only want the third expression to match, but all three are matching.
You should move the second .* into the lookahead, e.g.
^.*\bfrost(?!.*(?:flakes|snowman)).*$
Or
^.*\bfrost(?!.*flakes|.*snowman).*$
See the regex demo
In the original regex, the lookahead is located after a .* and whenever the lookahead returns false, the regex engine can backtrack and still match the string in another way, a location that is not immediately followed with snowman or flakes. When you put .* into the lookahead these two words do not have to appear immediately to the right of the current location.

Full match only if the capturing group encountered once

The pattern:
(test):(thestring)
What I want is full match only if there is just one test: before
test:thestring
But in this case there wouldn't be full match:
test:test:thestring
I've tried qualificator, but it didn't work.
Need help
Try this pattern: ^(?!.*((?(?<=^)|(?<=:))test(?=(:|$))).*(?1)).+$.
The main part is ((?(?<=^)|(?<=:))test(?=(:|$))), which matches test if it's preceeded by colon : or is at the beginning of a line and it's followed by colon : or end of the line.
(?(?<=^)|(?<=:)) this is workaround to (?<=(:|^)), but lookbehinds must have fixed length.
Then we have backreference to first capturing group (?1), to see if there are any other test.
This whole pattern is placed in negative lookahead (?!...), to match everything if it doesn't match pattern explained above (test matched more than one time).
Demo
for this very specific case:
(?<!.)(test:thestring)
Regex101
All it does is search for the string test:thestring and ensures that there are no characters before it. (Use Michał Turczyn's regex for an all purpose search!)
^((?!test:).)*(test:thestring)
See in action
If you want a full match and there should be only one time test: before test:string you might assert the start of the string ^, use a negative lookahead (?:(?!test:).) to match any character if what is on the right side is not test:
Then match test:thestring followed by a negative lookahead (?:(?!test:thestring).)* that matches any character if what is on the right side is not test:thestring and assert the end of the string $
^(?:(?!test:).)*test:thestring(?:(?!test:thestring).)*$
Regex demo

Regex Negative Lookbehind Matches Lookbehind text .NET

Say I have the following strings:
PB-GD2185-11652-MTCH
GD2185-11652-MTCH
KD-GD2185-11652-MTCH
KD-GD2185-11652
I want REGEX.IsMatch to return true if the string has MTCH in it and does not start with PB.
I expected the regex to be the following:
^(?<!PB)\S+(?=MTCH)
but that gives me the following matches:
PB-GD2185-11652-
GD2185-11652-
KD-GD2185-11652-
I do not understand why the negative lookbehind not only doesn't exclude the match but includes the PB characters in the match. The positive lookahead works as expected.
EDIT 1
Let me start with a simpler example. The following regex matches all of the strings as I would expect it to:
\S+
The following regex still matches all of the strings even though I would expect it not to:
\S+(?!MTCH)
The following regex matches all but the final H character on the first three strings:
\S+(?<!MTCH)
From the documentation at regex 101, a lookahead looks for text to the right of the pattern and a lookbehind looks for text to the left of the pattern, so having a lookahead at the beginning of a string does not jive with the documentation.
Edit 2
take another example with the following three strings:
grey
greyhound
hound
the regex:
^(?<!grey)hound
only matches the final hound. whereas the regex:
^(?<!grey)\S+
matches all three.
You need a lookahead: ^(?!PB)\S+(?=MTCH). Using the look-behind means the PB has to come before the first character.
The problem was because of the greediness of \S+. When dealing with lookarounds and greedy quantifiers you can easily match more characters than you expect. One way to deal with this is to insert a negative lookaround in a group with the greedy quantifier to exclude it as a match as stated in this question:
How to non-greedy multiple lookbehind matches
and on this helpful website about greediness in regular expressions:
http://www.rexegg.com/regex-quantifiers.html
Note that this second link has a few other ways to deal with the greediness in various situations.
A good regular expression for this situation is as follows:
^(?<!PB)((?!PB)\S+)(MTCH)
In situations like this it is going to be much clearer to do it logically within the code. So first check if the string matches MTCH and then that it doesn't match ^PB

regex optional lookahead

I want a regular expression to match all of these:
startabcend
startdef
blahstartghiend
blahstartjklendsomething
and to return abc, def, ghi and jkl respectively.
I have this the following which works for case 1 and 3 but am having trouble making the lookahead optional.
(?<=start).*(?=end.*)
Edit:
Hmm. Bad example. In reality, the bit in the middle is not numeric, but is preceeded by a certain set of characters and optionally succeeded by it. I have updated the inputs and outputs as requested and added a 4th example in response to someones question.
If you're able to use lookahead,
(?<=start).*?(?=(?:end|$))
as suggested by stema below is probably the simplest way to get the entire pattern to match what you want.
Alternatively, if you're able to use capturing groups, you should just do that instead:
start(.*?)(?:end)?$
and then just get the value from the first capture group.
Maybe like this:
(?<=start).*?(?=(?:end|$))
This will match till "start" and "end" or till the end of line, additionally the quantifier has to be non greedy (.*?)
See it here on Regexr
Extended the example on Regexr to not only work with digits.
An optional lookahead doesn't make sense:
If it's optional then it's ok if it matches, but it's also ok if it doesn't match. And since a lookahead does not extend the match it has absolutely no effect.
So the syntax for an optional lookahead is the empty string.
Lookahead alone won't do the job. Try this:
(?<=start)(?:(?!end).)*
The lookbehind positions you after the word "start", then the rest of it consumes everything until (but not including) the next occurrence of "end".
Here's a demo on Ideone.com
if "end" is always going to be present, then use:
(?<=start)(.*?)(?=end) as you put in the OP. Since you say "make the lookahead optional", then just run up until there's "end" or the carriage return. (?<=start)(.*?)(?=end|\n). If you don't care about capturing the "end" group, you can skip the lookahead and do (?:start)?(.*?)(?:end)? which will start after "start", if it's there and stop before "end", if it's there. You can also use more of those piped "or" patterns: (?:start|^) and (?:end|\n).
Why do you need lookahead?
start(\d+)\w*
See it on rubular