I would like to match a specific pattern with regex but I am running into catastrophic backtracking. I wonder if there's a way it would be possible to match what I would like and not get an error.
I start with a simple assumption; I want my string to contain only one specific number e.g. 7 and only that specific number:
^\D*7\D*$
Only if I find this pattern do I want to look for another word in the same text such as "Coffee"; I put my condition into a group (^\D*7\D*$) and reference the group in my conditional and the then part will contain "Coffee":
(?(1)Coffee|)
Is there another phrasing that would avoid the the catastrophic backtracking?
You can use a negative lookahead to assert that the word Coffee is at the right.
^(?=.*\bCoffee\b)\D*7\D*$
The pattern matches:
^ Start of string
(?= Positive lookahead, assert that on the right is
.*\bCoffee\b Match Coffee between word boundaries \b to prevent a partial match
) Close lookahead
\D*7\D* Match number 7 between optional non digit characters.
$ End of string
Regex demo
Note that \D also matches a newline. If you don't want to cross newline boundaries, you can use [^\r\n\d] instead.
Left to right checking is more traditional:
^(?=.*Coffee)[^\d7]*7\D*$
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
Coffee 'Coffee'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[^\d7]* any character except: digits (0-9), '7' (0
or more times (matching the most amount possible))
--------------------------------------------------------------------------------
7 '7'
--------------------------------------------------------------------------------
\D* non-digits (all but 0-9) (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the string
Right to left checking is only possible with engines like latest JavaScript, .NET or PyPi regex in Python:
^[^\d7]*7\D*$(?<=Coffee.*)
See proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^\d7]* any character except: digits (0-9), '7' (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
7 '7'
--------------------------------------------------------------------------------
\D* non-digits (all but 0-9) (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
Coffee 'Coffee'
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of look-behind
Related
I need to find 5 integers before the last underscore in a given filename.
Example string:
X130874_W907025343_Txt.pdf
I need to find 25353
The closest I came was (?<=_)[^_]+(?=[^_](.{5})_)
Use a lookahead after 5 digits that matches an underscore followed by no undercores until the end.
\d{5}(?=_[^_]*$)
Use
[0-9]{5}(?=_(?!.*_))
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
[0-9]{5} any character of: '0' to '9' (5 times)
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of look-ahead
I have some strings in Oracle where there is a minus sign (not at the beginning but inside the string), followed by a number (int or decimal with dot or comma).
I would like to find these in PLSQL. I have this already, and it's almost perfect:
REGEXP_LIKE(string, '-\d+(,|\.)*\d*')
I was hoping that it's finding strictly strings like somestring-11,1 but the problem is, it finds also strings like somestring-11a1,1 so where there is eventually a non numeric (or word) character between the minus and the numbers. I was trying to use negative lookahead, but unfortunately it's not working:
REGEXP_LIKE(string, '-\d+!(\w)(,|\.)*\d*')
because somestring-1s won't be found either anymore. Could you please point me to the right direction? Thank you.
Could you please try following, written and tested based on your shown samples. Simple explanation would be: using lazy match to match till - then match digits(1 or more occurrences) followed by , and followed by 1 or more occurrences of digits.
.*?-\d+,\d+
Online regex demo for above regex
Use
(^|\D)-(\d+([,.]*\d+)?)($|\W)
See proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\D non-digits (all but 0-9)
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \3 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[,.]* any character of: ',', '.' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of \3 (NOTE: because you are using a
quantifier on this capture, only the
LAST repetition of the captured pattern
will be stored in \3)
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\W non-word characters (all but a-z, A-Z, 0-
9, _)
--------------------------------------------------------------------------------
) end of \4
I'd like to match positively strings like "10a3b4c", "10a", "5b4c", "3a6c", but not match "2c1b" (because the letters aren't in alphabetical order) or the empty string.
Attempt: (\d+a)?(\d+b)?(\d+c)?
Problem: Matches the empty string. It falsely matches "".
Attempt: (\d+[abc]){1,3}
Problem: Doesn't enforce the a, b, c order. It falsely matches "2c1b"
How can this restriction be expressed as a regex?
Use
^(?!$)(\d+a)?(\d+b)?(\d+c)?$
See proof
Explanation
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
( group and capture to \1 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
a 'a'
--------------------------------------------------------------------------------
)? end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
( group and capture to \2 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
b 'b'
--------------------------------------------------------------------------------
)? end of \2 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \2)
--------------------------------------------------------------------------------
( group and capture to \3 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
c 'c'
--------------------------------------------------------------------------------
)? end of \3 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \3)
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
This seems to work (see it on regex101)
\b(?=[\dabc]+\b)(\d+a)?(\d+b)?(\d+c)?\b
I want to use regex validation formula on Text Field. Here is pure regex:
^(?!(?:\D*\d){7})\d+(\.\d{1,2})?$
When I test this expression in regex online tools (eg: https://regex101.com/) everything works fine.
But when I try to use this as validator in Orbeon like this:
matches(string(.), '^(?!(?:\D*\d){7})\d+(\.\d{1,2})?$') or xxf:is-blank(string(.))
I get error 'Incorrect XPath expression'.
When I removed from regex lookahead part, I was able to use it.
matches(string(.), '^\d+(\.\d{1,2})?$') or xxf:is-blank(string(.))
Is Orbeon Forms supports regex lookahead?
Regex lookahead:
https://www.regular-expressions.info/lookaround.html
Re-write the expression without lookahead. It matches strings with no more than 6 digits.
Use
^(\d{1,4}(\.\d{1,2})?|\d{5}(\.\d)?|\d{6})$
See proof
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d{1,4} digits (0-9) (between 1 and 4 times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \2 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d{1,2} digits (0-9) (between 1 and 2 times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of \2 (NOTE: because you are using a
quantifier on this capture, only the
LAST repetition of the captured pattern
will be stored in \2)
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\d{5} digits (0-9) (5 times)
--------------------------------------------------------------------------------
( group and capture to \3 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
)? end of \3 (NOTE: because you are using a
quantifier on this capture, only the
LAST repetition of the captured pattern
will be stored in \3)
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\d{6} digits (0-9) (6 times)
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
What is the REGEX to accept a string like this
Starts with EDO
has many characters(words,numbers,hypehns) in between
does not contain 24 or |(pipe)
Example:
Should match
edo-<<characters>>-<<characeters>>-<<numbers>>
BUT NOT
edo-<<characters>>-<<characeters>>-<<numbers>> | <<characeters>>- <<characeters>>- <<numbers>>
The string does not have a constant length
The negative look ahead will help you to decide if the string doesnt contain 24 or |
The regex can be written as
/^edo(?!.*(24|\|))[-a-zA-Z0-9]+$/i
Regex Demo
How it matches
^ Anchors the regex at the start of the string
edo The anchor ensures that the string starts with edo
(?!.*(24|\|)) look ahead assertion. It checks if the string doesnt contain 24 or |. If it doesnt contain, then proceeds with the remaining pattern. If it contains, discards the match
[-a-zA-Z0-9]+ Matches numbers alphabets or -
$ anchors the regex at the end of the string.
^EDO(?!.*(?:(?<!\d)24(?!\d)|\|))[a-zA-Z0-9 -]+$
Try this.This should work.Use flag gmi.
See demo.
https://regex101.com/r/fA6wE2/37
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
EDO 'EDO'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
24 '24'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-ahead
| OR
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9-]+ any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '-' (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string