I'm trying to learn more about regex today.
I'm simply trying to match an order number not surrounded by brackets (#1234 but not [#1234]) but my question is more in general about using lookahead assertions on an arbitrary pattern.
On my first attempts I noticed my negative lookahead match \d+(?!\]) would cause the \d+ to keep matching digits until it wasn't followed by a ]. I need the digits to match only if their entirety isn't followed by a ].
My current solution kills the match at the first digit by looking ahead to see if there's a ] in the digit chain.
Is this a standard way to go about this? I'm just repeating the match pattern in the lookahead. If this were a more complex regex, would I approach it the same? Repeat the valid match followed by the invalid match and have the regex engine repeat itself for every letter?
For valid matches, it would have to match itself as many times as the characters in the match.
(?<!\[) # not preceded by [
#\d+
(?!\d*\]) # not followed zero+ digits and ]
# or (?!\d|\]) # not followed by digit or ]
I'd appreciate any feedback!
You can achieve what you want by using a possessive quantifier along with lookarounds like this
(?<!\[)#\d++(?!\])
The problem in your case is when you use \d+ it allows backtracking and ends up having a partial match #123. Once you change that to possessive quantifier, it will not backtrack and only match if the sequence of digits is not preceded/followed by brackets.
Live Demo
Edit
If possessive quantifiers are not supported then you can use this one
#\d(?<!\[#\d)(?!\d*\])\d*
Related
Appreciating regex but still beginning.
I tried many workarounds but can't figure how to solve my problem.
String A : 4 x 120glgt
String B : 120glgt
I'd like the proper regex to return 120 as the number after "x".
But sometimes there won't be "x". So, be it [A] or [B] looking for one unique approach.
I tried :
to start the search from the END
Start right after the "x"
I clearly have some syntax issues and didn't quite get the logic of (?=)
(?=[^x])(?=[0-9]+)
So looking forward to learn with your help
As you tagged pcre, you could optionally match the leading digits followed by x and use \K to clear the match buffer to only match the digits after it.
^(?:\d+\h*x\h*)?\K\d+
The pattern matches:
^ Start of string
(?:\d+\h*x\h*)? Optionally match 1+ digits followed by x between optional spaces
\K Forget what is matched so far
\d+ Match 1+ digits
See a regex demo.
If you want to use a lookahead variant, you might use
\d+(?=[^\r\n\dx]*$)
This pattern matches:
\d+ Match 1+ digits
(?= Positive lookahead, assert what is to the right is
[^\r\n\dx]*$ Match optional repetitions of any char except a digit, x or a newline
) Close the lookahead
See another regex demo.
I want to match
abc_def_ghi,
abc_abc_ghi,
abc_a2a_ghi,
abc_999_ghi
but not abc_xxx_ghi (with xxx in center).
I came up to manually consuming look ahead (abc_(?!xxx)..._ghi), but I wonder is there any other way without manually specifying number of characters to skip.
Original qustion was with numbers, updated for strings case.
If you don't want to specify exactly how many characters to skip, perhaps you could use a quantifier like + in the negative lookahead and use a negated character class to match not an underscore.
\babc_(?!x+_)[^_]+_ghi\b
Explanation
\babc_ Word boundary, match abc_
(?! Negative lookahead, assert what is directly on the right is not
x+_ Match 1+ times x followed by an underscore
) Close lookahead
[^_]+_ Negated character class, match 1+ times any char except _
ghi\b Match ghi and word boundary
Regex demo
You can use this
123_(?:(?!000)\d){3}_789
Regex demo
If you don't wish to use look-arounds, this expression might be an option:
(?:abc_xxx_ghi)|(abc_.{3}_ghi)
Other than that I can't think of anything else.
DEMO
I'm having an issue creating a regular expression that will give me what I want. I need your help! So the text we are using is:
S 1SS 1S
"S" and "1S" are matches. "1SS" is not a match. I would like it to be a little more specific than just excluding anything with three characters but that may be a solution.
Any other ideas on how to exclude "1SS"? I can't figure it out!
Thank you,
Mark S.
You can use a negative lookahead pattern to avoid matching a consecutive letter S:
\b\d*S(?!S)
Demo: https://regex101.com/r/sv467b/2
Explanations: \b matches a word boundary to ensure that this won't match the second S in two consecutive Ses. \d* matches zero or more digits to allow optional preceding numbers. S is followed by (?!S), a negative lookahead pattern to ensure that what follows S is not another S.
A regexp with more general applications is something like:
\b(?:(.)(?!\1))+\b
\b is for word boundaries.
List item
(?:) is a non-capturing group.
(?:) is a negative lookahead group.
\1 is the group reference.
I have the following regex:
[a-zA-Z0-9. ]*(?!cs)
and the string
Hotfix H5.12.1.00.cs02_ADV_LCR
I want to match only untill
Hotfix H5.12.1.00
But the regex matches untill "cs02"
Shouldn't the negative lookahead have done the job?
You may consider using a tempered greedy token:
(?:(?!\.cs)[a-zA-Z0-9. ])*
See the regex demo.
This will work regardless of whether .cs is present in the string or not because the tempered greedy token matches any 0+ characters from the [a-zA-Z0-9. ] character class that is not .cs.
You need to use positive lookahead instead of negative lookahead.
[a-zA-Z0-9. ]*(?=\.cs)
or
[a-zA-Z0-9. ]+(?=\.cs)
Note that your regex [a-zA-Z0-9. ]*(?!cs) is greedy and matches all the characters until it reaches a boundary which isn't followed by cs. See here.
At first pattern [a-zA-Z0-9. ]+ matches Hotfix H5.12.1.00.cs02 greedily because this pattern greedily matches alphabets , dots and spaces. Once it see the underscore char, it stops matching where the two conditions is satisfied,
_ won't get matched by [a-zA-Z0-9. ]+
_ is not cs
It works same for the further two matches also.
I'd like to find a string in url with notepad++ regular expression. Unfortunately I can't.
http://www.example.com/profile/mera-handelsgesellschaft-mbh-182055?category_id=154331
What I want to have is 182055
I will only find it. Not change.
My last try was ([^\-|^\=])(\d+)([^\?])
How can I find it
try this regex please:
\d+(?=\?)
\d look for a digits
\d+ look for one or more digits
(?=\?) is a Positive Lookahead. This means that select one or more digits that there is a ? character after them.
from regex101:
\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
(?=\?) Positive Lookahead - Assert that the regex below can be matched
\? matches the character ? literally
Regex101 Demo