How to match branches with negative and positive lookahead regex? - regex

I'm working with some builds and have to write a regex to include some branches, but exclude others (in order to create the builds).
The best I could come up with is this regex which uses positive and negative lookahead to match the branch names:
(?=.*12\.3)^((?!(version-12\.3)).)*$
Here are the branch names and how they should be matched:
bugfix-ISSUE-123-some-details-version-12.3
ISSUE-1234-some-other-details-version-12.3
bugfix-12.3
bugfix2-12.3
12.3stuff
stu12.3ff
// match everything above, but don't match anything from below
master
version-12.3
version-3.21
some-other-branch
bugfix-3.21
test
Please use this online tool (it's the only one I found that supports negative and positive lookahead regexes).
Right now the regex I came up with works fine, EXCEPT for the following 2 branches:
bugfix-ISSUE-123-some-details-version-12.3
ISSUE-1234-some-other-details-version-12.3
The reason they are not included is because I used this negative lookahead regex which excludes version-12.3 (what I want), but also excludes anything else that includes this string (like ISSUE-123-version-12.3, which I want included, but it's not):
((?!(version-12\.3)).)*$
Can you help a bit, please?

If you need to fail all matches where a string having 12.3 inside starts with version-+some digits/dots, you may use
^(?!version-\d+\.\d).*12\.3.*$
See the regex demo.
Details:
^ - start of string
(?!version-\d+\.\d) - a negative lookahead that fails the match if there is version-, 1+ digits, a dot and a digit right at the start of the string
.* - any 0+ chars (other than line break chars)
12\.3 - a 12.3 substring
.* - any 0+ chars (other than line break chars)
$ - end of string.
If the version- + digits/dots is disallowed as a whole string, use
^(?!version-[\d.]+$).*12\.3.*$
See another regex demo.
Here, ^ will match the start of string and then the (?!version-[\d.]+$) will trigger the check: if there is version- followed with 1+ digits/dots up to the string end ($) the match will be failed.

Related

Regex positive lookbehind after digits

Trying to get the all characters until new line after one/more digits with positive lookbehind from below text with this
(?<=below customer.\s.*\n.* )(.*)
I order standardinstalation to below customer.
Paul Rilley
Abbeyroad 55
It works (gives 55) if the roadname does not have a space. Not working with (High Tory road). Also there could be letters after the digits (55b) that I should get.
I need to look behind the words (below customer) since the first line is the only part that is always the same.
You can use
(?m)(?<=below customer\.\r?\n(?:.+\n)*?.+ )(\d+[A-Za-z]*)\r?$
See the .NET regex demo.
Details:
(?m) - multiline mode to make $ match end of any line is on
(?<=below customer\.\r?\n(?:.+\n)*?.+ ) - the lookbehind to match below customer., then a line ending sequence, then zero or more lines with a line ending sequence, as few as possible, and then zero or more chars other than newline till the last space followed with
(\d+[A-Za-z]*) - Group 1: one or more digits and then zero or more letters
\r?$ - an optional CR char and the end of line.
It will also match 55b.
In most regex flavors, a lookbehind must be fixed width. In .NET, variable width is supported.
You can use both in PCRE and .NET:
/(?<=below customer\.)\r?\n.*\r?\n.* (\w+)$/gm
Demo for PCRE
Demo for .NET

Regex (PCRE): Match all digits in a line following a line which includes a certain string

Using PCRE, I want to capture only and all digits in a line which follows a line in which a certain string appears. Say the string is "STRING99". Example:
car string99 house 45b
22 dog 1 cat
women 6 man
In this case, the desired result is:
221
As asked a similar question some time ago, however, back then trying to capture the numbers in the SAME line where the string appears ( Regex (PCRE): Match all digits conditional upon presence of a string ). While the question is similar, I don't think the answer, if there is one at all, will be similar. The approach using the newline anchor ^ does not work in this case.
I am looking for a single regular expression without any other programming code. It would be easy to accomplish with two consecutive regex operations, but this not what I'm looking for.
Maybe you could try:
(?:\bstring99\b.*?\n|\G(?!^))[^\d\n]*\K\d
See the online demo
(?: - Open non-capture group:
\bstring99\b - Literally match "string99" between word-boundaries.
.*?\n - Lazy match up to (including) nearest newline character.
| - Or:
\G(?!^) - Asserts position at the end of the previous match but prevent it to be the start of the string for the first match using a negative lookahead.
) - Close non-capture group.
[^\d\n]* - Match 0+ non-digit/newline characters.
\K - Resets the starting point of the reported match.
\d - Match a digit.

Regex Exclude Number Within Two Characters of Number

I have some manually entered data (it's an email subject), and I am trying to extract the correct ID to perform a series of actions with RPA on.
RE:'HC=312-822-281' abc2-1234567 7354612
I have a regex query:
(?<!\d)\d{7}(?!\d)
I want to extract 7354612 but not 1234567.
I want to avoid matching any 7-digit number that is preceded with a hyphen, or a hyphen and a space.
My initial query works 80% of the time, but this hyphen issue is interfering with the other 20%.
You can modify the existing (?<!\d) lookbehind to also exclude the position after a hyphen, i.e. (?<![\d-]), and add another lookbehind to exclude the hyphen + space context ((?<!- ) or (?<!-\s)):
(?<![\d-])(?<!- )\d{7}(?!\d)
(?<![\d-])(?<!-\s)\d{7}(?!\d)
Note \s matches any whitespace. See the regex demo.
Details
(?<![\d-]) - a negative lookbehind that fails the match if there is a digit or a hyphen immediately to the left of the current location
(?<!-\s) - a negative lookbehind that fails the match if there is a - and a space after it immediately to the left of the current location
\d{7} - any seven digits
(?!\d) - a negative lookahead that fails the match if there is a digit immediately to the right of the current location.
Variations
With PCRE regex, you may also use
-\s*\d{7}(?!\d)(*SKIP)(*F)|(?<!\d)\d{7}(?!\d)
See the regex demo, where -\s*\d{7}(?!\d)(*SKIP)(*F)| matches -, 0+ spaces, seven digits after which there are no more digits and skips that match, only returning matches for the (?<!\d)\d{7}(?!\d) pattern.
In .NET, modern JavaScript and PyPi regex in Python, you may use
(?<!\d|-\s*)\d{7}(?!\d)
See this regex demo. Here, (?<!\d|-\s*) negative lookbehind fails the match if there is a digit or - + 0 or more whitespace chars immediately to the left of the current position.

What expression should I use to get desired results?

For strings like Cisco 3750 i7706-cm021 10.123.12.34 -> 10.123.34.12 I would like to get result Cisco 3750 i7706-cm021 10.123.12.34 -> using expression ^.*(?![\d\.]{12}$). But instead a whole string is matched. What is the correct expression would be?
You may use a regex like
^.*?(?=\b(?:\d{1,3}\.){3}\d{1,3}$)
See the regex demo and the Regulex graph:
Details
^ - start of string
.*? - any 0+ chars other than line break chars, as few as possible
(?=\b(?:\d{1,3}\.){3}\d{1,3}$) - a positive lookahead that requires (immediately to the right of the current location):
\b - word boundary
(?:\d{1,3}\.){3} - three repetitions of 1 to 3 digits and a dot
\d{1,3} - one to three digits
$ - end of string.
To get more precise IP regex, see How to Find or Validate an IP Address:
^.*?(?=\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$)
See the regex demo

RegExp: How do I include 'avoid non-numeric characters' from a pattern search?

I want to filter out all .+[0-9]. (correct way?) patterns to avoid duplicate decimal points within a numeral: (e.g., .12345.); but allow non-numerals to include duplicate decimal points: (e.g. .12345*.) where * is any NON-NUMERAL.
How do I include a non-numeral negation value into the regexp pattern? Again,
.12345. <-- error: erroneous numeral.<br/>
.12345(.' or '.12345*.' <-- Good.
I think you are looking for
^\d*(?:\.\d+)?(?:(?<=\d)[^.\d\n]+\.)?$
Here is a demo
Remember to escape the regex properly in Swift:
let rx = "^\d*(?:\\.\\d+)?(?:(?<=\\d)[^.\\d\\n]+\\.)?$"
REGEX EXPLANATION:
^ - Start of string
\d* - Match a digit optionally
(?:\.\d+)? - Match decimal part, 0 or 1 time (due to ?)
(?:(?<=\d)[^.\d\n]+\.)? - Optionally (due to ? at the end) matches 1 or more symbols preceded with a digit (due to (?<=\d) lookbehind) other than a digit ([^\d]), a full stop ([^.]) or a linebreak ([^\n]) (this one is more for demo purposes) and then followed by a full stop (\.).
$ - End of string
I am using non-capturing groups (?:...) for better performance and usability.
UPDATE:
If you prefer an opposite approach, that is, matching the invalid strings, you can use a much simpler regex:
\.[0-9]+\.
In Swift, let rx = "\\.[0-9]+\\.". It matches any substrings starting with a dot, then 1 or more digits from 0 to 9 range, and then again a dot.
See another regex demo
The non-numeral regex delimited character is \D. Conversely, if you're looking for only numerals, \d would work.
Without further context of what you're trying to achieve it's hard to suggest how to build a regex for it, though based on your example, (I think) this should work: .+\d+\D+