Perl Regex match balanced parentheses - regex

Following strings - match:
"MNO(A=(B=C) D=(E=F)) PQR(X=(G=H) I=(J=(K=L)))" - "MNO"
"MNO(A=(B=C) D=(E=F))" - "MNO"
"MNO" - "MNO"
"RAX.MNO(A=(B=C) D=(E=F)) PQR(X=(G=H) I=(J=(K=L)))" - "RAX.MNO"
"RAX.MNO(A=(B=C) D=(E=F))" - "RAX.MNO"
"RAX.MNO" - "RAX.MNO"
Inside every brace, there can be unlimited groups of them, but they have to be closed properly.
Any ideas? Don't know how to test properly for closure.
I have to use a Perl-Regular-Expression.

In Perl or PHP, for example, you could use a regex like
/\((?:[^()]++|(?R))*\)/
to match balanced parentheses and their contents.
See it on regex101.
To remove all those matches from a string $subject in Perl, you could use
$subject =~ s/\((?:[^()]++|(?R))*\)//g;
Explanation:
\( # Match a (
(?: # Start of non-capturing group:
[^()]++ # Either match one or more characters except (), don't backtrack
| # or
(?R) # Match the entire regex again, recursively
)* # Any number of times
\) # Match a )

Related

Modify Go regex so it doesn't pick up the last character

I have this regex, which works as on this link: https://regex101.com/r/HVKfYU/1
This is my regex string: (\d+[-–]\(?\d+([+\-*/^]\d+ ?[+\-*/^] ?\d+)?\)?)
These are my test strings:
(0–(2^63 - 1))
(1-(2^16 - 2))
(1-29999984)
(3-32)
This is what the regex matches in the first two cases:
0–(2^63 - 1)
1-(2^16 - 2)
// works, it doesn't match the first pair of brackets
And this is what it matches in the last two:
1-29999984)
3-32)
// doesn't work, it matches the closing bracket
I'd like it to not match the last closing bracket in any of the test strings. At the moment I'm stripping the bracket if necessary, but I would like to avoid that. How could I modify the regex, so it works as I would like?
Try (\d+[-–](?:\d+|\(\d+([+\-*/^]\d+[ ]?[+\-*/^][ ]?\d+)?\)))
demo
it just match digits or block with paren
add some explern
(
\d+ [-–]
(?: # non capture for alternation
\d+ # dd-dd form
| # or
\( \d+ # dd-(dd + dd) form
(
[+\-*/^]
\d+
[ ]?
[+\-*/^]
[ ]?
\d+
)?
\)
)
)

match character not enclosed by braces recursively

I'm trying to split a string on pipes, when they are not enclosed by braces.
i've got a regex that works, unless there are recursive braces:
~\([^)]*\)(*SKIP)(*F)|\|~
test(test(test|tester)|test)|test
^ and ^ are matched, only last one should match
regex101 link to play around
You may use the following regex based on a subroutine:
(\((?:[^()]++|(?1))*\))(*SKIP)(*F)|\|
See the regex demo
Details
(\((?:[^()]++|(?1))*\)) - Group 1 that matches
\( - a (
(?:[^()]++|(?1))* - 0 or more occurrences of:
[^()]++ - any 1+ chars other than ( and )
| - or
(?1) - the whole Group 1 pattern is recursed (note that (?R) would not work here since it would recurse the whole regex pattern)
\) - a ) char
(*SKIP)(*F) - PCRE verb sequence that omits the currently matched text and makes the regex engine search for the next match beginning from the end of the current match
| - or
\| - a literal |

Regular Expression to switch values

The following regular expression I use in notepad works on almost all lines, but a few and I can't get it to work.
Assert.Equal(false, General.CashFactor.IsHighlighted());
Assert.Equal(values, General.CashFactor.GetValue());
Assert.Equal(int.Parse(General.Time.Hour.GetValue()), this.GetLocalTimeEN(int.Parse(timezone.Timezone2.Zeitverschiebung), int.Parse(hour)));
This are 3 lines, the first and second work with the regex, but the 3rd doesn't, its output is always:
Assert.Equal(int.Parse(hour)), int.Parse(General.Time.Hour.GetValue()), this.GetLocalTimeEN(int.Parse(timezone.Timezone2.Zeitverschiebung)));
Here are the 2 commands I use for
search: Assert.Equal\((.*), (.*)(\);)
replace: Assert.Equal\($2, $1\)
Now what I would need is a command that makes $1 only up to the first , and $2 all the rest up to the );
Would appreciate the help, I'm sure I'm very close, but I just can't get it to work
You can do it using a recursive pattern (to handle nested parenthesis):
search:
\bAssert\.Equal\(\K\h*((?(R)[^()]*(?:\((?1)\)[^()]*)*|[^,()]*(?:\((?1)\)[^(),]*)*))\h*,\h*((?1))\h*(?=\))
replacement:
\2, \1
details:
\bAssert\.Equal\(\K
\h*
( # first capturing group
(?(R) # if you are in a recursion (inside parenthesis), commas are allowed
[^()]* (?: \( (?1) \) [^()]* )*
| # otherwise not
[^,()]* (?: \( (?1) \) [^(),]* )*
)
)
\h*,\h*
((?1)) # second capturing group (the same than the first)
\h*(?=\))
demo

Find an item in the text with exceptions[Regular Expression]

Please help create a regular expression that would be allocated "|" character everywhere except parentheses.
example|example (example(example))|example|example|example(example|example|example(example|example))|example
After making the selection should have 5 characters "|" are out of the equation. I want to note that the contents within the brackets should remain unchanged including the "|" character within them.
Considering you want to match pipes that are outside any set of parentheses, with nested sets, here's the pattern to achieve what you want:
Regex:
(?x) # Allow comments in regex (ignore whitespace)
(?: # Repeat *
[^(|)]*+ # Match every char except ( ) or |
( # 1. Group 1
\( # Opening paren
(?: # chars inside:
[^()]++ # a. everything inside parens except nested parens
| # or
(?1) # b. nested parens (recurse group 1)
) #
\) # Until closing paren.
)?+ # (end of group 1)
)*+ #
\K # Keep text out of match
\| # Match a pipe
regex101 Demo
One-liner:
(?:[^(|)]*+(\((?:[^()]++|(?1))\))?+)*+\K\|
regex101 Demo
This pattern uses some advanced features:
Possessive quantifiers
Recursion
Resetting the match start

PCRE (recursive) pattern that matches a string containing a correctly parenthesized substring. Why does this one fail?

Well, there are other ways (hmmm... or rather working ways) to do it, but the question is why does this one fail?
/
\A # start of the string
( # group 1
(?: # group 2
[^()]* # something other than parentheses (greedy)
| # or
\( (?1) \) # parenthesized group 1
) # -group 2
+ # at least once (greedy)
) # -group 1
\Z # end of the string
/x
Fails to match a string with nested parentheses: "(())"
It doesn't fail
$ perl junk.pl
matched junk >(())<
$ cat junk.pl
my $junk = qr/
\A # start of the string
( # group 1
(?: # group 2
[^()]* # something other than parentheses (greedy)
| # or
\( (?1) \) # parenthesized group 1
) # -group 2
+ # at least once (greedy)
) # -group 1
\Z # end of the string
/x;
if( "(())" =~ $junk ){
print "matched junk >$1<\n";
}
Wow!.. Thank you, junk! It really works... in Perl. But not in PCRE. So, the question is mutating into "What's the difference between Perl and PCRE regex pattern matching?"
And voila! There is an answer:
Recursion difference from Perl
In PCRE (like Python, but unlike Perl), a recursive subpattern call is
always treated as an atomic group. That is, once it has matched some of
the subject string, it is never re-entered, even if it contains untried
alternatives and there is a subsequent matching failure.
Therefore, we just need to swap two subpatterns:
/ \A ( (?: \( (?1) \) | [^()]* )+ ) \Z /x
Thank you!