Regex combining optional group with conditions - regex

The following combinations should be covered by this regex:
test-ds-s**
test-s**
test-d**
(** two numbers from 0-9)
My regex looks like this: ^test-(ds-)?[ds]\\d{2,2}$
But now test-ds-d** is also possible, what I dont want. Is there any way to make the d only possible, when the optional ds- part is not used?

You can use
^test-(ds-(?!d))?[ds]\d{2}$
See the regex demo.
Details
^ - start of string
test- - a fixed string
(ds-(?!d))? - an optional capturing group matching ds- if not immediately followed with d
[ds] - d or s
\d{2} - two digits
$ - end of string.

Related

How to make sure optional parts of a pattern occure at least once?

How to make sure that part of the pattern (keyword in this case) is in the pattern you're looking for, but it can appear in different places. I want to have a match only when it occurs at least once.
Regex:
\b(([0-9])(xyz)?([-]([0-9])(xyz)?)?)\b
We only want the value if there is a keyword: xyz
Examples:
1. 1xyz-2xyz - it's OK
2. 1-2xyz - it's OK
3. 1xyz - it's OK
4. 1-2 - there should be no match, at least one xyz missing
I tried a positive lookahead and lookbehind but this is not working in this case.
You can make use of a conditional construct:
\b([0-9])(xyz)?(?:-([0-9])(xyz)?)?\b(?(2)|(?(4)|(?!)))
See the regex demo. Details:
\b - word boundary
([0-9]) - Group 1: a digit
(xyz)? - Group 2: an optional xyz string
(?:-([0-9])(xyz)?)? - an optional sequence of a -, a digit (Group 3), xyz optional char sequence
\b - word boundary
(?(2)|(?(4)|(?!))) - a conditional: if Group 2 (first (xyz)?) matched, it is fine, return the match, if not, check if Group 4 (second (xyz)?) matched, and return the match if yes, else, fail the match.
See the Python demo:
import re
text = "1. 1xyz-2xyz - it's OK\n2. 1-2xyz - it's OK\n3. 1xyz - it's OK\n4. 1-2 - there should be no match"
pattern = r"\b([0-9])(xyz)?(?:-([0-9])(xyz)?)?\b(?(2)|(?(4)|(?!)))"
print( [x.group() for x in re.finditer(pattern, text)] )
Output:
['1xyz-2xyz', '1-2xyz', '1xyz']
Indeed you could use a lookahead in the following way:
\b\d(?:xyz|(?=-\dxyz))(?:-\d(?:xyz)?)?\b
See this demo at regex101 (or using ^ start and $ end)
The first part matches either an xyz OR (if there is none) the lookahead ensures that the xyz occures in the second optional part. The second part is dependent on the previous condition.
Try this: \b(([0-9])?(xyz)+([-]([0-9])+(xyz)+)?)\b
Replace ? with +
Basically ?: zero or more and in your case you want to match one or more.
Whih is +
How about something as basic as this
(\dxyz-\dxyz|\dxyz-\d|\d-\dxyz|\dxyz)
You can add word boundary if needed
\b(\dxyz-\dxyz|\dxyz-\d|\d-\dxyz|\dxyz)\b
Just an OR

Regular Expression Tester for Google RE2

I'm looking for a regular expression tester for Google Big Data (RE2) reg expressions. There are a few testers out there, but none of them seems to understand my statement. These are the ones I've tried and they've worked for simple expressions but not with mine:
https://regex101.com/
https://www.regextester.com
https://www.analyticsmarket.com/freetools/regex-tester/
This is my regex:
^(?:1-)?((?:R|RO|Ro)?[:|.]?\\s?\\d{3}[-|.]?\\d{4}[-|/]F\\d{2}-\\d{2})$
where I would process strings like these:
Ro 708-2859/F07-01
RO708-2859-F06-04
RO703-3877-F01
1-RO520-0628-F08
RO6868847-000-010
Does anyone have an idea of how I might enter the statement different or where I could test it?
You can use
^(?:1-)?((?:R[Oo]?)?[:.]?\s?\d{3}[-.]?\d{4}[-/](?:F\d{2}(?:-\d{2})?|\d{3}[-/]\d{3}))$
See the regex demo. Details:
^ - start of string
(?:1-)? - an optional 1- string
((?:R[Oo]?)?[:.]?\s?\d{3}[-.]?\d{4}[-/](?:F\d{2}(?:-\d{2})?|\d{3}[-/]\d{3})) - Group 1:
(?:R[Oo]?)? - an optional sequence of R and then an optional O or o
[:.]? - an optional : or .
\s? - an optional whitespace
\d{3} - three digits
[-.]? - an optional - or .
\d{4} - four digits
[-/] - - or /
(?:F\d{2}(?:-\d{2})?|\d{3}[-/]\d{3}) - either F, two digits and then an optional sequence of - and two digits, or three digits, - or / and three digits
$ - end of string.
See the Google Sheets demo:
You may try to use https://www.regexplanet.com/advanced/golang/index.html
I've tried your regexp, and this also pointing to Re2 for docs.

What expression should I use to get desired results?

For strings like Cisco 3750 i7706-cm021 10.123.12.34 -> 10.123.34.12 I would like to get result Cisco 3750 i7706-cm021 10.123.12.34 -> using expression ^.*(?![\d\.]{12}$). But instead a whole string is matched. What is the correct expression would be?
You may use a regex like
^.*?(?=\b(?:\d{1,3}\.){3}\d{1,3}$)
See the regex demo and the Regulex graph:
Details
^ - start of string
.*? - any 0+ chars other than line break chars, as few as possible
(?=\b(?:\d{1,3}\.){3}\d{1,3}$) - a positive lookahead that requires (immediately to the right of the current location):
\b - word boundary
(?:\d{1,3}\.){3} - three repetitions of 1 to 3 digits and a dot
\d{1,3} - one to three digits
$ - end of string.
To get more precise IP regex, see How to Find or Validate an IP Address:
^.*?(?=\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$)
See the regex demo

REGEXP_REPLACE for exact regex pattern, not working

I'm trying to match an exact pattern to do some data cleanup for ISSN's using the code below:
select case when REGEXP_REPLACE('1234-5678 ÿþT(zlsd?k+j''fh{l}x[a]j).,~!##$%^&*()_+{}|:<>?`"\;''/-', '([0-9]{4}[\-]?[Xx0-9]{4})(.*)', '$1') not similar to '[0-9]{4}[\-]?[Xx0-9]{4}' then 'NOT' else 'YES' end
The pattern I want match any 8 digit group with a possible dash in the middle and possible X at the end.
The code above works for most cases, but if capture group 1 is the following example: 123456789 then it also returns positive because it matches the first 8 digits, and I don't want it to.
I tried surrounding capture group 1 with ^...$ but that doesn't work either.
So I would like to match exactly these examples and similar ones:
1234-5678
1234-567X
12345678
1234567X
BUT NOT THESE (and similar):
1234567899
1234567899x
What am I missing?
You may use
^([0-9]{4}-?[Xx0-9]{4})([^0-9].*)?$
See the regex demo
Details
^ - start of string
([0-9]{4}-?[Xx0-9]{4}) - Capturing group 1 ($1): four digits, an optional -, and then four x / X or digits
([^0-9].*)? - an optional Capturing group 2: any char other than a digit and then any 0+ chars as many as possible
$ - end of string.

Regex for Chilean RUT/RUN with PCRE

I'm having issues with the validation of the chilean RUT/RUN with a regex expression in PCRE. I have the next regular expression but sadly can't make it work:
\b[0-9|.]{1,10}\-[K|k|0-9]
I need help to see what is wrong with the code. The application I need to use only uses PCRE.
Thank you.
You may use
^(\d{1,3}(?:\.\d{1,3}){2}-[\dkK])$
to match and capture (that is not usually necessary, but your app requires a capturing group to extract its contents) a whole string that matches the pattern. See the regex demo.
To match shorter strings that match this pattern inside a larger string, you may remove ^ and $ (see demo) or use \b word boundaries instead (see this demo).
Details:
^ - start of string
\d{1,3} - 1 to 3 digits
(?:\.\d{1,3}){2} - 2 sequences of a literal . and 1 to 3 digits
- - a hyphen
[\dkK] - a digit, k or K.
$ - end of string.
As they sometimes omit the dots, I used this one:
^(\d{1,2}(?:[\.]?\d{3}){2}-[\dkK])$
Details:
^ - start of string
\d{1,2} - 1 or 2 digits
(?:[.]?\d{3}){2} - 2 sequences of an optional '.' and 3 digits
- a hyphen
[\dkK] - a digit, k or K
$ - end of string
1234567-k OK
12345678-k OK
1.234.567-k OK
12.345.678-k OK
known issue:
12.345678-k and 12345.678-k still OK and I do not like this :(
You need to change to ^(\d{1,3}(?:\.\d{3}){2}-[\dkK])$ to capture only 2 sequence of 3 digits after the first sequence of 1-3 digits.
please consider being more specific in the REGEX build, since it matched wrong numbers, such as 17.87.335-2. Also the included one did't match formats without the dots or the hyphens.
Please consider using the following format: \b(\d{1,3}(?:(.?)\d{3}){2}(-?)[\dkK])\b
Modified prior version to try the other formats: https://regex101.com/r/2Us0j6/9