Matching inner pattern an unlimited amount of times within outer pattern

Matching inner pattern an unlimited amount of times within outer pattern - regex

Say I have the following pattern:
INDICATOR\s+([a-z0-9]+)
which would match for example:
INDICATOR AA or INDICATOR B3
I need to edit this pattern so it matches any instances of a string which starts with INDICATOR has a space and then has multiple matches of the inner pattern e.g.
INDICATOR AA A3 66 B8 34 CD
INDICATOR BG 4D CS
INDICATOR HG
Is it possible to do this?
Solution
With thanks to Gumbo I came up with the following regex which suits my requirements:
INDICATOR((\s+)?([,-])?(\s+)?([a-z0-9]+))+

Try this:
INDICATOR(\s+([a-z0-9]+))+
Here the repeating pattern is wrapped in a group and quantified using + to allow one or more repetitions of the expression inside the group. But you won’t get every match of the inner group with this but only the last match (or to be more specific: it depends on the implementation you’re using).

Related

PCRE2 - Match every word whose suffix matches a backreference

Given the string below,
ay bee ceefooh deefoo38 ee 37 ef gee38 aitch 38 eye19 jay38 kay 99 el88 em38 en 29 ou38 38 pee 12 q38 arr 999 esss 555
the goal is to match every word such that the suffix is a number that matches the number that appears after foo (which happens to be 38 in this case).
There is only one substring that begins with foo and ends with a number. The expected matches all exist after said substring.
Expected matches:
gee38
jay38
em38
ou38
q38
I've tried foo(\d+).*?(\w+\1)\b and foo(\d+).*(\w+\1)\b, but they fail to match all, because they either match the first one (gee38) or the last one (q38).
Is it possible to match all with just a single regex and, importantly, in just a single run?
The PCRE2 engine that I use behaves in the same way as https://regex101.com/r/uFEDOE/1. So, if the regex can match multiple substrings on regex101, then the engine that I use can too.

(?:foo|\G(?!^))(\d+).*?(?=(\w+))\w+(?=\1\b)
Demo
It could be some size or performance optimization.
#Niko Gambt, say if any optimization is important for you.

Finding nth occurrence of a pattern within a string in SQL (Presto)

I am writing a query in Presto SQL using the function regexp_extract
I have a string that may look like the following examples:
'1A2B2C3D3E'
'1A1B2C2D3E'
'1A2B1C2D2E'
What I'm trying to do is find for example the second occurrence of 1[A-E].
If I try
regexp_extract(col, '(1[A-E])(1[A-E])', 2)
This will work for the second example (and the first since it returns nothing since there is no second occurence). However, this will fail for the third example. It returns nothing. I know that is because my regex is searching for a 1[A-E] followed directly by another 1[A-E].
So then I tried
regexp_extract(col, '(1[A-E])(.*)(1[A-E])', 3)
But this does not work either. I am not sure how I can account for the fact that I may have 1A1B2C or 1A2B1C to find that second 1. Any help?

Your second pattern does work in the latest version of Trino (formerly known as Presto SQL):
WITH t(col) AS (
VALUES
'1A2B2C3D3E',
'1A1B2C2D3E',
'1A2B1C2D2E')
SELECT regexp_extract(col, '(1[A-E])(.*)(1[A-E])', 3)
FROM t
_col0
-------
NULL
1B
1C
(3 rows)
As others have commented, you don't need the capture groups for the first match or for the .*, and you should use the lazy quantifier to avoid .* eagerly matching all characters between the first and last occurrence:
WITH t(col) AS (
VALUES
'1A2B2C3D3E',
'1A1B2C2D3E',
'1A2B1C2D2E',
'1A2B1C2D1E')
SELECT regexp_extract(col, '1[A-E].*?(1[A-E])', 1)
FROM t
_col0
-------
NULL
1B
1C
1C
(4 rows)

You don't need the second capture group (.*) to keep the 2 capture groups in the result, and you can optionally match the allowed characters in between.
From what I read on this page you might also consider using regexp_extract_all to get all the matches, as regexp_extract returns the first match.
As the example data consists of a digit followed by a char A-E, you could exclude matching the 1 from the character class to prevent overmatching and backtracking.
(1[A-E])[02-9A-E]*(1[A-E])
Regex demo
If using a single capture group to get the second value is also ok, you can use
1[A-E][02-9A-E]*(1[A-E])
Regex demo

Regex pattern to match "AA BB CC DD"

I have a hexadecimal string with space separator for each byte.
eg., A1 B2 C3 D4 E5 FF 00 11 22 33 44 ...
I would like to use a regex validator to verify the user input is correct or not?
How could I write the regular expression to achieve this goal?

Something like this:
^[A-F0-9]{2}( [A-F0-9]{2})*$
Explanation:
^ - anchor: string start
[A-F0-9]{2} - two symbols in either 0..9 or A..F range
( [A-F0-9]{2})* - followed by space and two 0..9 or A..F symbols zero or more times
$ - anchor: string end
If you allow a..f as valid hexadecimal symbols
^[A-Fa-f0-9]{2}( [A-Fa-f0-9]{2})*$

I would like to propose a solution based on DRY principle
(Don't Repeat Yourself).
Instead of writing the same pattern (as Dmitry proposed), you can:
Write the pattern for 2 hex digits as a capturing group - ([A-F0-9]{2}).
"Call" it again using (?1).
So the whole pattern can be ^([A-F0-9]{2})( (?1))*$.
There are also other variants of "calling" a capturing group, e.g.
(?-1) - call the preceding group or
(?&name) - call a named group.
For details see https://www.regular-expressions.info/subroutine.html

Regular Expression Extracting Text from a group

I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great

Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)

I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1

Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)

Visual Basic - RegEx - Overall Length Check regardless the number of matches

I have the following problem :
This is my RegEx-Pattern :
\d*[a-z A-Z][a-zA-Z0-9 _?!()\/\\]*
It allows anything but numbers that stand alone like : 1 , 11 , 111 or so on.
My question : How can I set the overall Length of the input regardless of the matches ?
i tried it with several options like {1,30} before each match and i put the regex in a group with ( ) and then {1,30} but it still doesnt work.
If anyone could help me i would appreciate it :).
Allowed string:
Group1
Group 1
1Group
Group!?()\/
Group !()\?!
a1 a1 a1 a1
Not Allowed:
1
11
And so on. {1,30} after a match restricts the number of how many times i can input the match. What i want to know is: How can i set the maximum length of my above RegEx, like after 30 chars the input is reached regardless of the matches?

In order to disallow a numeric string input only, you can use a negative look-ahead (?!\d+$) and to set a limit to the input, use a limiting quantifier {1,30}:
(?!\d+$)[a-zA-Z0-9 _?!()\/\\]{1,30}
See demo
Note that if you plan to match whole strings, you'd need anchors: ^ at the beginning will anchor the regex to the beginning of string, and $ will anchor at the end.
^(?!\d+$)[a-zA-Z0-9 _?!()\/\\]{1,30}$
See another demo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Matching inner pattern an unlimited amount of times within outer pattern - regex

Related

PCRE2 - Match every word whose suffix matches a backreference

Finding nth occurrence of a pattern within a string in SQL (Presto)

Regex pattern to match "AA BB CC DD"

Regular Expression Extracting Text from a group

Visual Basic - RegEx - Overall Length Check regardless the number of matches

Categories

Resources