Regex negative lookahead not working as expected

Regex negative lookahead not working as expected - regex

I have the following regex:
[a-zA-Z0-9. ]*(?!cs)
and the string
Hotfix H5.12.1.00.cs02_ADV_LCR
I want to match only untill
Hotfix H5.12.1.00
But the regex matches untill "cs02"
Shouldn't the negative lookahead have done the job?

You may consider using a tempered greedy token:
(?:(?!\.cs)[a-zA-Z0-9. ])*
See the regex demo.
This will work regardless of whether .cs is present in the string or not because the tempered greedy token matches any 0+ characters from the [a-zA-Z0-9. ] character class that is not .cs.

You need to use positive lookahead instead of negative lookahead.
[a-zA-Z0-9. ]*(?=\.cs)
or
[a-zA-Z0-9. ]+(?=\.cs)
Note that your regex [a-zA-Z0-9. ]*(?!cs) is greedy and matches all the characters until it reaches a boundary which isn't followed by cs. See here.
At first pattern [a-zA-Z0-9. ]+ matches Hotfix H5.12.1.00.cs02 greedily because this pattern greedily matches alphabets , dots and spaces. Once it see the underscore char, it stops matching where the two conditions is satisfied,
_ won't get matched by [a-zA-Z0-9. ]+
_ is not cs
It works same for the further two matches also.

Related

Find match within a first match

I have the following string
abc123+InterestingValue+def456
I want to get the InterestingValue only, I am using this regex
\+.*\+
but the output it still includes the + characters
Is there a way to search for a string between the + characters, then search again for anything that is not a + character?

Use lookarounds.
(?<=\+)[^+]*(?=\+)
DEMO

You can use a positive lookahead and a positive lookbehind (more info about these here). Basically, a positive lookbehind tells the engine "this match has to come before the next match", and a positive lookahead tells the engine "this has to come after the previous match". Neither of them actually match the pattern they're looking for though.
A positive lookbehind is a group beginning with ?<= and a positive lookahead is a group beginning with ?=. Adding these to your existing expression would look like this:
(?<=\+).*(?=\+)
regex101

If it should be the first match, you can use a capture group with an anchor:
^[^+]*\+([^+]+)\+
^ Start of string
[^+]* Optionally match any char except + using a negated character class
\+ Match literally
([^+]+) Capture group 1, match 1+ chars other than +
\+ Match literally
Regex demo

Regex: Match pattern unless preceded by pattern containing element from the matching character class

I am having a hard time coming up with a regex to match a specific case:
This can be matched:
any-dashed-strings
this-can-be-matched-even-though-its-big
This cannot be matched:
strings starting with elem- or asdf- or a single -
elem-this-cannot-be-matched
asdf-this-cannot-be-matched
-
So far what I came up with is:
/\b(?!elem-|asdf-)([\w\-]+)\b/
But I keep matching a single - and the whole -this-cannot-be-matched suffix. I cannot figure it out how to not only ignore a character present inside the matching character class conditionally, and not matching anything else if a suffix is found
I am currently working with the Oniguruma engine (Ruby 1.9+/PHP multi-byte string module).
If possible, please elaborate on the solution. Thanks a lot!

If a lookbehind is supported, you can assert a whitespace boundary to the left, and make the alternation for both words without the hyphen optional.
(?<!\S)(?!(?:elem|asdf)?-)[\w-]+\b
Explanation
(?<!\S) Assert a whitespace boundary to the left
(?! Negative lookahead, assert the directly to the right is not
(?:elem|asdf)?- Optionally match elem or asdf followed by -
) Close the lookahead
[\w-]+ Match 1+ word chars or -
\b A word boundary
See a regex demo.
Or a version with a capture group and without a lookbehind:
(?:\s|^)(?!(?:elem|asdf)?-)([\w-]+)\b
See another regex demo.

RegEx: don't capture match, but capture after match

There are a thousand regular expression questions on SO, so I apologize if this is already covered. I did look first.
I have string:
Name Subname 11X22 88X620 AB33(20) YA5619 77,66
I need to capture this string: YA5619
What I am doing is just finding AB33(20) and after this I am capturing until first white space. But AB33(20) can be AB-33(20) or AB33(-20) or AB33(-1).
My preg_match regex is: (?<=\bAB\d{2}\(\d{2}\)\s).+?(?=\s)
Why I am getting error when I change from \d{2} to \d+?
For final result I was thinking this regix will work but no:
(?<=\bAB-?\d+\(-?\d+\)\s).+?(?=\s)
Any ideas what I am doing wrong?

With most regex flavors, lookbehind needs to evaluate to a fixed-length sequence, so you can't use variable quantifiers like * or + or even {1,2}.
Instead of using lookaround, you can simply match your marker pattern and then forget it with \K.
AB-?\d+(?:\(-?\d+\))? \K[^ ]+
demo: https://regex101.com/r/8XXngH/1

It depends on the language. If it is in .NET for example, it matches due to the various length in the lookbehind.
Another solution might be to use a character class and add the character you would allow to match. Then match a whitespace character and capture in a group matching \S+ which matches 1+ times not a whitespace character.
\bAB[()\d-]+\s\K\S+
Explanation
\bAB Match literally prepended with word boundary to prevent AB being part of a larger match.
[()\d-]+ Match 1+ times any of the listed character in the character class
\s Match a whitespace char (or \s+ to match 1 or more)
\K Reset the starting point of the reported match( Forget what was matched)
\S+ Match in a group 1+ times not a whitespace character
Regex demo | Php demo

Regex for spoof

I would like to ask for help regarding my problem when it comes to spoofing let say usernames and I want to catch them using regex.
for example the correct username is :
rolf
and here are the spoofed versions that I could think of:
roooolf
r123olf
123rolf123
rolf5623
123rolf
rollllf
rrrrrrolf
rolffff
So basically I have this regex expression ( that I know is not sufficient because I've tried it on regex101 website )
.+(?![rolf]).+
I'm using this as a baseline because it doesnt catch the correct username which is :
rolf
but it doesn't catch all the other "spoofed" versions of the username.
Any Ideas how can I make my regex more efficient?
Thanks in advance!

You may try this too
(?m)^(?![^\n]*?rolf[^\n]*$).*$
Demo

To match not exactly rolf You can use a negative lookahead (?! to assert that what follows from the beginning of the string is not 'rolf' until the end of the string.
^(?!rolf$).+$
That would match
^ Assert position at the begin of the string
(?! Negative lookahead that asserts that what follows is not
rolf Match literally
) Close negative lookahead
.+ Match any character one or more times
$Assert position at the end of the string
From your example regex you match .+ where #Ωmega has a fair point, matches spaces.
Instead of .+ you could specify what characters you might accept like \w+ for example to match one or more word characters or specify more using a character class.

You can use a regex pattern
\b(?!rolf\b)\S+\b
\b Word boundary - Matches a word boundary position between a
word character and non-word character or position (start / end of
string).
(?! Negative lookahead - Specifies a group that can not match
after the main expression (if it matches, the result is discarded).
\S Not whitespace - Matches any character that is not a
whitespace character (spaces, tabs, line breaks).
+ Quantifier - Match 1 or more of the preceding token.
Test your inputs with this pattern here.

Match everything to the first unescaped (with \) character

I have following input:
!foo\[bar[bB]uz\[xx/
I want to match everything from start to [, including escaped bracket \[ and ommiting first characters if in [!#\s] group
Expected output:
foo\[bar
I've tried with:
(?![!#\s])[^/\s]+\[
But it returns:
foo\[bar[bB]uz\[

Java: Use Lookbehind
(?<=!)(?:\\\[|[a-z])+
See the regex demo
Explanation
The lookbehind (?<=!) asserts that what precedes the current position is the character !
The non-capture group (?:\\\[|[a-z]) matches \[ OR | a letter between a and z
The + causes the group to be matched one or more times
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind

You can use this regex:
!((?:[^[\\]*\\\[)*[^[]*)
Online Regex Demo

Add a ? after [^/\s]+ to catch the shortest group possible
Add \w+ to the end to catch the first group of alphanumeric characters after \[
Result :
(?![!#\s])[^\/\s]+?\[\w+
Try it

You can try this pattern:
(?<=^[!#\s]{0,1000})(?:[^!#\s\\\[]|\\.)(?>[^\[\\]+|\\.)*(?=\[)
pattern details:
The begining is a lookbehind and means preceded by zero or several forbidden characters at the start of the string
(?:[^!#\s\\\[]|\\.) ensures that the first character is an allowed character or an escaped character.
(?>[^\[\\]+|\\.)* describes the content: all that is not a [ or a \, or an escaped character. (note that this subpattern can be written like that too: (?:[^\[\\]|\\.)*)
(?=\[) checks that the next character is a literal opening square bracket. (since all escaped characters are matched by the precedent group, you can be sure that this one is not escaped)
link to fiddle (push the Java button)

Use a negated character class first the start (ie the match must not start with a special char), then a reluctant quantifier (which stops at the first hit), with a negative look behind to skip over escaped brackets:
[^!#\s].*?(?<!\\)\[
See live demo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex negative lookahead not working as expected - regex

I have the following regex: [a-zA-Z0-9. ]*(?!cs) and the string Hotfix H5.12.1.00.cs02_ADV_LCR I want to match only untill Hotfix H5.12.1.00 But the regex matches untill "cs02" Shouldn't the negative lookahead have done the job?

You may consider using a tempered greedy token: (?:(?!\.cs)[a-zA-Z0-9. ])* See the regex demo. This will work regardless of whether .cs is present in the string or not because the tempered greedy token matches any 0+ characters from the [a-zA-Z0-9. ] character class that is not .cs.

Related

Find match within a first match

Regex: Match pattern unless preceded by pattern containing element from the matching character class

RegEx: don't capture match, but capture after match

Regex for spoof

Match everything to the first unescaped (with \) character

Categories

Resources