RegEx - Match a number that has different beginnings - regex

I'm new to RegEx and I'm trying to match a specific number that has 8 digits, and has 3 start options:
00
15620450000
VS
For Example:
1562045000012345678
VS12345678
0012345678
12345678
I don't want to match the 4th option.
Right now I have managed to match the first and third options, but I'm having problems with the second one, I wrote this expression, trying to match the 8 digits under 'Project':
156204500|VS|00(?<Project>\d{8})
What should I do?
Thanks

With your shown samples, please try following regex once.
^(?:00|15620450000|VS)(\d{8})$
OR to match it with Project try:
^(?:00|15620450000|VS)(?<Project>\d{8})$
Online demo for above regex
Explanation: Adding detailed explanation for above.
^(?:00|15620450000|VS) ##Checking value from starting and in a non-capturing group matching 00/15620450000/VS here as per question.
(?<Project>\d{8} ##Creating group named Project which is making sure value has only 8 digits till end of value.
)$ ##Closing capturing group here.

Let's understand why your solution failed that will help you get around such kind of problems in the future. Your regex, 156204500|VS|00(\d{8}) is processed as follows:
156204500 OR VS OR 00(\d{8})
In arithmetic,
1 + 2 + 3 (4 + 5) <--- (4 + 5) is multiplied with only 3
is different from
(1 + 2 + 3) (4 + 5) <--- (4 + 5) is multiplied with (1 + 2 + 3)
This rule is applicable to RegEx as well. Obviously, you intended to use the second form.
By now, you must have already figured out the following solution:
(15620450000|VS|00)(\d{8})
Note that unless you want to capture a group, a capturing group does not make sense and this is where regex has another concept called non-capturing group which you obtain by putting ?: as the first thing in the parentheses. With a non-capturing group, the final solution becomes:
(?:15620450000|VS|00)\d{8}

Related

Finding nth occurrence of a pattern within a string in SQL (Presto)

I am writing a query in Presto SQL using the function regexp_extract
I have a string that may look like the following examples:
'1A2B2C3D3E'
'1A1B2C2D3E'
'1A2B1C2D2E'
What I'm trying to do is find for example the second occurrence of 1[A-E].
If I try
regexp_extract(col, '(1[A-E])(1[A-E])', 2)
This will work for the second example (and the first since it returns nothing since there is no second occurence). However, this will fail for the third example. It returns nothing. I know that is because my regex is searching for a 1[A-E] followed directly by another 1[A-E].
So then I tried
regexp_extract(col, '(1[A-E])(.*)(1[A-E])', 3)
But this does not work either. I am not sure how I can account for the fact that I may have 1A1B2C or 1A2B1C to find that second 1. Any help?
Your second pattern does work in the latest version of Trino (formerly known as Presto SQL):
WITH t(col) AS (
VALUES
'1A2B2C3D3E',
'1A1B2C2D3E',
'1A2B1C2D2E')
SELECT regexp_extract(col, '(1[A-E])(.*)(1[A-E])', 3)
FROM t
_col0
-------
NULL
1B
1C
(3 rows)
As others have commented, you don't need the capture groups for the first match or for the .*, and you should use the lazy quantifier to avoid .* eagerly matching all characters between the first and last occurrence:
WITH t(col) AS (
VALUES
'1A2B2C3D3E',
'1A1B2C2D3E',
'1A2B1C2D2E',
'1A2B1C2D1E')
SELECT regexp_extract(col, '1[A-E].*?(1[A-E])', 1)
FROM t
_col0
-------
NULL
1B
1C
1C
(4 rows)
You don't need the second capture group (.*) to keep the 2 capture groups in the result, and you can optionally match the allowed characters in between.
From what I read on this page you might also consider using regexp_extract_all to get all the matches, as regexp_extract returns the first match.
As the example data consists of a digit followed by a char A-E, you could exclude matching the 1 from the character class to prevent overmatching and backtracking.
(1[A-E])[02-9A-E]*(1[A-E])
Regex demo
If using a single capture group to get the second value is also ok, you can use
1[A-E][02-9A-E]*(1[A-E])
Regex demo

Regular Expression Match Groups [duplicate]

This question already has answers here:
Regular expression to stop at first match
(9 answers)
Closed 2 years ago.
I have the following regular expression:
(.*(\d*)(-)(\d*).*)
It correctly matches the following string:
Court 19-24
However, the second group is empty - Group 1: Court 19-24, Group 2: [empty], Group 3: -, Group 4: 24
What is wrong with my regular expression that the second group doesn't contain 19?
Maybe you are looking for something like this?
let re = /.* (\d+)(-)(\d+).*/;
let str = 'Court 19-24';
var match = re.exec(str);
console.log('group 0:', match[0])
console.log('group 1:', match[1])
console.log('group 2:', match[2])
console.log('group 3:', match[3])
Group indexes annotated
(.(\d)(-)(\d*).*)
1 2 3 4
Not sure what you expected here?
To match 19 and 24 into two separate groups, use something like this:
(\d+)-(\d+)
To also match the court name into a group, a non-greedy star works.
(.*?) (\d+)-(\d+)
I bet your initial regex was more like this:
(.*(\d*)(-)(\d*).*)
This will behave closer to what you describe, because
. also matches digits, i.e. .* will go up to the -, and
* is not required to match anything at all, i.e. \d* it will happily match zero digits, causing group 2 to be empty.
Lessons:
Don't use * when you expect at least something to be there. Prefer + in this case.
Be wary of the greedy star, especially when used with the unspecific . it can match things you did not intend.
You don't have to put parenthesis around everything in regular expressions. Only add groups around things you want to extract (i.e. "capture"), or around things you want to match/fail/repeat as one (i.e. "make atomic").

How to create a matching regex pattern for "greater than 10-000-000 and lower than 150-000-000"?

I'm trying to make
09-546-943
fail in the below regex pattern.
​^[0-9]{2,3}[- ]{0,1}[0-9]{3}[- ]{0,1}[0-9]{3}$
Passing criteria is
greater than 10-000-000 or 010-000-000 and
less than 150-000-000
The tried example "09-546-943" passes. This should be a fail.
Any idea how to create a regex that makes this example a fail instead of a pass?
You may use
^(?:(?:0?[1-9][0-9]|1[0-4][0-9])-[0-9]{3}-[0-9]{3}|150-000-000)$
See the regex demo.
The pattern is partially generated with this online number range regex generator, I set the min number to 10 and max to 150, then merged the branches that match 1-8 and 9 (the tool does a bad job here), added 0? to the two digit numbers to match an optional leading 0 and -[0-9]{3}-[0-9]{3} for 10-149 part and -000-000 for 150.
See the regex graph:
Details
^ - start of string
(?: - start of a container non-capturing group making the anchors apply to both alternatives:
(?:0?[1-9][0-9]|1[0-4][0-9]) - an optional 0 and then a number from 10 to 99 or 1 followed with a digit from 0 to 4 and then any digit (100 to 149)
-[0-9]{3}-[0-9]{3} - a hyphen and three digits repeated twice (=(?:-[0-9]{3}){2})
| - or
150-000-000 - a 150-000-000 value
) - end of the non-capturing group
$ - end of string.
This expression or maybe a slightly modified version of which might work:
^[1][0-4][0-9]-[0-9]{3}-[0-9]{3}$|^[1][0]-[0-9]{3}-[0-9]{2}[1-9]$
It would also fail 10-000-000 and 150-000-000.
In this demo, the expression is explained, if you might be interested.
This pattern:
((0?[1-9])|(1[0-4]))[0-9]-[0-9]{3}-[0-9]{3}
matches the range from (0)10-000-000 to 149-999-999 inclusive. To keep the regex simple, you may need to handle the extremes ((0)10-000-000 and 150-000-000) separately - depending on your need of them to be included or excluded.
Test here.
This regex:
((0?[1-9])|(1[0-4]))[0-9][- ]?[0-9]{3}[- ]?[0-9]{3}
accepts (space) or nothing instead of -.
Test here.

Replace nested double brace pair with single [duplicate]

This question already has answers here:
Can regular expressions be used to match nested patterns? [duplicate]
(11 answers)
Closed 4 years ago.
Any ideas how to replace:
..((....))..
With:
..(...)..
Be aware, it is not a straight up replace of "((" with "(". The expression must determine that the child brace pair being removed is contained directly with the parent pair, with no other content.
Bonus points if anyone can figure out how to function recursively, e.g. "(((...)))" to "(...)"
You can use this:
([(]*)(?:\([^)]*\))([)]*)
You just need to replace groups with empty string if even first group size is equal to second group or else use the minimum one.
Test:
(ABC)
((ABC))
(((ABC)))
((ABC)a)
Match Information:
Match 1
Full match 0-5 `(ABC)`
Group 1. 0-0 ``
Group 2. 5-5 ``
--> Hence, no update required
Match 2
Full match 6-13 `((ABC))`
Group 1. 6-7 `(`
Group 2. 12-13 `)`
--> As Group 1 and Group 2 size is same, replace those values with '' resulting to '(ABC)
Match 3
Full match 14-23 `(((ABC)))`
Group 1. 14-16 `((`
Group 2. 21-23 `))`
--> Same in this case as well
Match 4
Full match 24-30 `((ABC)`
Group 1. 24-25 `(`
Group 2. 30-30 ``
--> As group 1 and group 2 are not of same size, reduce to the min one which is group 2 (size 0) and hence no update required leaving it to '((ABC)A)'
Demo

Regex for for Phone Numbers allowing for only 6 to 20 characters

Regex beginner here. I've been trying to tackle this rule for phone numbers to no avail and would appreciate some advice:
Minimum 6 characters
Maximum 20 characters
Must contain numbers
Can contain these symbols ()+-.
Do not match if all the numbers included are the same (ie. 111111)
I managed to build two of the following pieces but I'm unable to put them together.
Here's what I've got:
(^(\d)(?!\1+$)\d)
([0-9()-+.,]{6,20})
Many thanks in advance!
I'd go about it by first getting a list of all possible phone numbers (thanks #CAustin for the suggested improvements):
lst_phone_numbers = re.findall('[0-9+()-]{6,20}',your_text)
And then filtering out the ones that do not comply with statement 5 using whatever programming language you're most comfortable.
Try this RegEx:
(?:([\d()+-])(?!\1+$)){6,20}
Explained:
(?: creates a non-capturing group
(\d|[()+-]) creates a group to match a digit, parenthesis, +, or -
(?!\1+$) this will not return a match if it matches the value found from #2 one or more times until the end of the string
{6,20} requires 6-20 matches from the non-capturing group in #1
Try this :
((?:([0-9()+\-])(?!\2{5})){6,20})
So , this part ?!\2{5} means how many times is allowed for each one from the pattern to be repeated like this 22222 and i put 5 as example and you could change it as you want .