There is a text like this (many lines)
1. sdfsdf werwe werwemax45 rwrwerwr
2. 34348878 max max44444445666 sdf
3. 4353424 23423eedf max55 dfdg dfgdf
4. max45
5. 4324234234sdfsdf maxx34534
Using regular expressions I need to find all lines and include a word max<digits> (containing digits instead of literally <digits>) into a matching group.
So I've tried this regular expression:
^.*?\b(max\d+)\b.*?$
But it finds only lines containing max... and ignores others.
Then I’ve tried
^.*?\b(max\d+)?\b.*?$
It finds all lines but without matching group containing max....
The issue can be "debugged" with a slightly modified pattern, ^(.*?)\b(max\d+)?\b(.*?)$, with the rest of the pattern wrapped into separate capturing groups. You can see that the lines are all matched by the Group 3 pattern, the last .*?. It happens because the first .*? is skipped (since it is a lazy pattern), then (max\d=)? matches an empty string at the start of the line (none begins with max + digits - but if any line starts with that pattern, you would get it captured), and the last .*? captures the whole line.
You can fix it by wrapping the first part into a non-capturing optional group capturing the max\d+ into an obligatory capturing group
^(?:.*?\b(max\d+)\b)?.*?$
Or even without ?$ at the end since .* will match greedily up to the end of the line:
^(?:.*?\b(max\d+)\b)?.*
See the regex demo
Details
^ - start of string (with m option, start of a line)
(?:.*?\b(max\d+)\b)? - an optional non-capturing group:
.*? - any 0+ chars, other than line break chars as few as possible
\b - a word boundary
(max\d+) - Group 1 (obligatory, will be tried once): max and 1+ digits
\b - a word boundary
.* - rest of the line
Related
I tried to capture the second match from given text i.e,
hash=e1467eb30743fb0a180ed141a26c58f7&token=a62ef9cf-2b4e-4a99-9335-267b6224b991:IO:OPCA:117804471:OPI:false:en:opsdr:117804471&providerId=paytm
In the above text, I want to capture the second number with the length of 9 (117804471).
I tried following, but it didn't work; so please help me resolving in this.
https://regex101.com/r/vBJceR/1
You can use
^(?:.*?\K\b[0-9]{9}\b){2}
See the regex demo.
Details:
^ - start of string
(?: - start of a non-capturing group:
.*? - any zero or more chars other than line break chars (as few as possible) followed with
\K - match reset operator discarding text matched so far
\b[0-9]{9}\b - a 9-digit number as a whole word
){2} - two occurrences of the pattern sequence defined above.
I made the following regex :
(\w{2,3})(,\s*\w{2,3})*
It mean the sentence should start with 2 or 3 letter, 2 or 3 letter as infinite.
Now i should authorise the word blue and yellow.
(\w{2,3}|blue|yellow)(,\s*\w{2,3})*
It will works inly if blue and yellow are at the beginning
Is there a way to allow the exception's word after comma without repeting the word in the code ?
I'd say go with the answer given by #Toto, but if your language doesn't support recursive patterns, you could try:
^(?![, ])(?:,?\s*\b(?:\w{2,3}|blue|yellow))+$
See the online demo
^ - Start string anchor.
(?![, ]) - Negative lookahead to prevent starting with a comma or space.
(?: - Open 1st non-capture group.
,?\b - Match an optional comma, zero or more space characters and a word-boundary.
(?: - A nested 2nd non-capture group.
\w{2,3}|blue|yellow - Lay our your options just once.
) -Close 2nd non-capture group.
)+ - Close 1st non capture group and match at least once.
$ - End string anchor.
Just be aware that \w{2,3} allows for things like __ and _1_ to be valid inputs.
If the language you are using supports recursive patterns, you can use:
^(blue|yellow|\w{2,3})(?:,\s*(?1))*$
Demo & explanation
If either blue or yellow can occur only once:
^(?:\w{2,3}\s*,\s*)*(?:blue|yellow)(?:\s*,\s*\w{2,3})*$
The pattern matches
^ Start of string
(?:\w{2,3}\s*,\s*)* Optionally repeat 2-3 word chars followed by a comma
(?:blue|yellow) Match either blue or yellow
(?:\s*,\s*\w{2,3})* Optionally match a comma and 2-3 word chars
$ End of string
Regex demo
I have following text:
:3:Start!##$%^&*():31:Start!##$%^&*():31:End!##$%^&*():3:End
and with following regex:
(:3:Start)(.*)(:31:Start.*:31:End)?(.*)(:3:End)
Why group3 is not found even though it exists. Even if I set group2 as not greedy:
(:3:Start)(.*?)(:31:Start.*:31:End)?(.*)(:3:End)
How Can I capture group with optional subgroup if it occurs in the middle of the text
You may achieve what you need if you enclose the (.*?) and (:31:Start.*:31:End) groups into an optional non-capturing group (quantified with a greedy ? quantifier) and making the optional group obligatory:
(:3:Start)(?:(.*?)(:31:Start.*:31:End))?(.*)(:3:End)
|____________________________|
See the regex demo. It will work like this:
(:3:Start) - will capture into Group 1 the :3:Start` string
(?:(.*?)(:31:Start.*:31:End))? - will attempt to match once a sequence of patterns:
(.*?) - Group 2: any 0 or more chars other than line break chars as few as possible
(:31:Start.*:31:End) - Group 3: :31:Start.*:31:End string
(.*) - Group 4: any 0 or more chars other than line break chars as many as possible
(:3:End) - captures into Group 5 :3:End string
Why doesn't your pattern work?
See your pattern demo, the !##$%^&*():31:Start!##$%^&*():31:End!##$%^&*() substring is captured into Group 4, matched with (.*) pattern. It happens because (.*?)(:31:Start.*:31:End)? first skips the .*? pattern (it is lazy, non-greedy, the engine does not even attempt to match it when it sees such a pattern the first time, it goes on matching with obligatory patterns and only comes back when the subsequent patterns do not match), and (:31:Start.*:31:End)? matches an empty string right after :3:Start substring. The rest finds a match, thus, no optional text is matched into your expected group.
I have the following template :
1251 Left Random Text I want to fill
It can go through multiple lines
As you can see
9841 Right Again we see a lot of random text with 3115 numbers
And this also goes
To multiple lines
0121 Right
5151 Right This one is just one line
I was wrong
9731 Left This one is just a line
5123 NA Instruction 5151 was wrong
4113 Right Instr 9841 was correct
We checked
I want to have 3 groups:
1251
Left
Random Text I want to fill
It can go through multiple lines
As you can see
I'm using
(\d+)\s(\w+)\s(.*)
but it stops at the current line only (so I get only Random Text I want to fill in group 3, although I want including As you can see)
If I'm using Single line flag I get only 1 match for each group, group 3 almost being all
Here is live : https://regex101.com/r/W3x0mH/4
You could use a repeating group matching all the lines while asserting that the next line does not start wit 1+ digits followed by Left or Right:
(\d+)\s(\w+)\s(.*(?:\r?\n(?!\d).*)*)
Explanation
(\d+)\s(\w+)\s Match the first 2 groups
(Third capturing group
.* Match 0+ times any char except a newline
(?: Non capturing group
\r?\n(?!\d).* Match newline, assert what is on the right is not a digit
)* Close non capturing group and repeat 0+ times
) Close capturing group
Regex demo
You may use this regex with a lookahead:
^(\d+)\s(\w+)\s(.*?)(?=\n\d|\z)
with DOTALL and MULTILINE modifiers.
Updated Regex Demo
RegEx Details:
^: Line start
(\d+): Match and capture 1+ digits in group #1
\s: match a whitespace
(\w+): Match and capture 1+ word characters in group #2
\s: match a whitespace
(.*?): Match 0 or more of any character (non-greedy) provided next lookahead assertion is satiSfied
(?=\n\d|\z): Lookahead assertion to assert that we have a newline followed by a digit or there is end of input
Faster Regex:
If you are using this regex on a long string then you should also keep overall performance in mind as a regex with DOTALL modifier will tend to get slow on a large size text. For that I suggest using this regex that doesn't need DOTALL modifier:
^(\d+)\s(\w+)\s(.*(?:\n.*)*?)(?=\n\d|\z)
RegEx Demo 2
On regex101 demo this regex takes just 181 steps as compared to first one that takes 1300 steps.
For the third group, repeat any character while using negative lookahead for ^\d, which would indicate the start of a new match:
(\d+)\s(\w+)\s((?:(?!^\d)[\s\S])*)
https://regex101.com/r/W3x0mH/5
You may try with this regex:
^(\d+)\s+(\w+)\s+(.*?)(?=^\d|\z)
^(\d+)\s+ , ^\d+ Line begins with numbers followed by one or more whitespace character \s+
(\w+)\s+ where \w+ one or more characters (left,right,na or something else) followed by one or more whitespace \w+
(.*?) matches everything until it finds a line beginning with number or \z end of string.
I think it fits your requirement....
Regex101
Can't get why this regex (regex101)
/[\|]?([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures all the input, while this (regex101)
/[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures only |Func
Input string is |Func(param1, param2, param32, param54, param293, par13am, param)|
Also how can i match repeated capturing group in normal way? E.g. i have regex
/\(\(\s*([a-z\_]+){1}(?:\s+\,\s+(\d+)*)*\s*\)\)/gui
And input string is (( string , 1 , 2 )).
Regex101 says "a repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations...". I've tried to follow this tip, but it didn't helped me.
Your /[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g regex does not match because you did not define a pattern to match the words inside parentheses. You might fix it as \|+([a-z0-9A-Z]+)(?:\(?(\w+(?:\s*,\s*\w+)*)\)?)?\|?, but all the values inside parentheses would be matched into one single group that you would have to split later.
It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer.
What you may do is get mutliple matches with preg_match_all capturing the initial delimiter.
So, to match the second string, you may use
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\()\K\w+
See the regex demo.
Details:
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\() - either the end of the previous match (\G(?!\A)) and a comma enclosed with 0+ whitespaces (\s*,\s*), or 1+ | symbols (\|+), followed with 1+ alphanumeric chars (captured into Group 1, ([a-z0-9A-Z]+)) and a ( symbol (\()
\K - omit the text matched so far
\w+ - 1+ word chars.