Regular Expression to switch values - regex

The following regular expression I use in notepad works on almost all lines, but a few and I can't get it to work.
Assert.Equal(false, General.CashFactor.IsHighlighted());
Assert.Equal(values, General.CashFactor.GetValue());
Assert.Equal(int.Parse(General.Time.Hour.GetValue()), this.GetLocalTimeEN(int.Parse(timezone.Timezone2.Zeitverschiebung), int.Parse(hour)));
This are 3 lines, the first and second work with the regex, but the 3rd doesn't, its output is always:
Assert.Equal(int.Parse(hour)), int.Parse(General.Time.Hour.GetValue()), this.GetLocalTimeEN(int.Parse(timezone.Timezone2.Zeitverschiebung)));
Here are the 2 commands I use for
search: Assert.Equal\((.*), (.*)(\);)
replace: Assert.Equal\($2, $1\)
Now what I would need is a command that makes $1 only up to the first , and $2 all the rest up to the );
Would appreciate the help, I'm sure I'm very close, but I just can't get it to work

You can do it using a recursive pattern (to handle nested parenthesis):
search:
\bAssert\.Equal\(\K\h*((?(R)[^()]*(?:\((?1)\)[^()]*)*|[^,()]*(?:\((?1)\)[^(),]*)*))\h*,\h*((?1))\h*(?=\))
replacement:
\2, \1
details:
\bAssert\.Equal\(\K
\h*
( # first capturing group
(?(R) # if you are in a recursion (inside parenthesis), commas are allowed
[^()]* (?: \( (?1) \) [^()]* )*
| # otherwise not
[^,()]* (?: \( (?1) \) [^(),]* )*
)
)
\h*,\h*
((?1)) # second capturing group (the same than the first)
\h*(?=\))
demo

Related

Regex to remove all parentheses except most external ones

I have been trying and reading many similar SO answers with no luck.
I need to remove parentheses in the text inside parentheses keeping the text. Ideally with 1 regex... or maybe 2?
My text is:
Alpha (Bravo( Charlie))
I want to achieve:
Alpha (Bravo Charlie)
The best I got so far is:
\\(|\\)
but it gets:
Alpha Bravo Charlie
You can use a regex like this:
(\(.*?)\((.*?)\)
With this replacement string:
$1$2
Regex demo
Update: as per ııı comment, since I don't know your full sample text I provide this regex in case you have this scenario
(\([^)]*)\((.*?)\)
Regex demo
From your post and comments, it seems you want to remove only the inner most parenthesis, for which you can use following regex,
\(([^()]*)\)
And replace with $1 or \1 depending upon your language.
In this regex \( matches a starting parenthesis and \) matches a closing parenthesis and ([^()]*) ensures the captured text doesn't contain either ( or ) which ensures it is the innermost parenthesis and places the captured text in group1, and whole match is replaced by what got captured in group1 text, thus getting rid of the inner most parenthesis and retaining the text inside as it is.
Demo
Your pattern \(|\) uses an alternation then will match either an opening or closing parenthesis.
If according to the comments there is only 1 pair of nested parenthesis, you could match:
(\([^()]*)\(([^()]*\)[^()]*)\)
( Start capturing group
\( Match opening parenthesis
[^()]* Match 0+ times not ( or )
) Close group 1
\( Match
( Capturing group 2
\([^()]*\) match from ( till )
[^()]* Match 0+ times not ( or )
) close capturing group
\) Match closing parenthesis
And replace with the first and the second capturing group.
Regex demo

Find an item in the text with exceptions[Regular Expression]

Please help create a regular expression that would be allocated "|" character everywhere except parentheses.
example|example (example(example))|example|example|example(example|example|example(example|example))|example
After making the selection should have 5 characters "|" are out of the equation. I want to note that the contents within the brackets should remain unchanged including the "|" character within them.
Considering you want to match pipes that are outside any set of parentheses, with nested sets, here's the pattern to achieve what you want:
Regex:
(?x) # Allow comments in regex (ignore whitespace)
(?: # Repeat *
[^(|)]*+ # Match every char except ( ) or |
( # 1. Group 1
\( # Opening paren
(?: # chars inside:
[^()]++ # a. everything inside parens except nested parens
| # or
(?1) # b. nested parens (recurse group 1)
) #
\) # Until closing paren.
)?+ # (end of group 1)
)*+ #
\K # Keep text out of match
\| # Match a pipe
regex101 Demo
One-liner:
(?:[^(|)]*+(\((?:[^()]++|(?1))\))?+)*+\K\|
regex101 Demo
This pattern uses some advanced features:
Possessive quantifiers
Recursion
Resetting the match start

Regex pattern without one case

I would like to remove some strings from filename.
I want to remove every string in bracket but not if there is a string "remix" or "Remix" or "REMIX"
Now I have got
sed "s/\s*\(\s?[A-z0-9. ]*\)//g"
but how to exclude cases when there is remix in string?
You can use a capture group:
sed 's/\(\s*([^)]*remix[^)]*)\)\|\s*(\s\?[a-z0-9. ]*)/\1/gi'
When the "remix branch" doesn't match, the capture group is not defined and the matched part is replaced with an empty string.
When the "remix branch" succeeds, the matched part is replaced by the content of the capture group, so by itself.
Note: if that helps to avoid false positive, you can add word-boundaries around "remix": \bremix\b
pattern details:
\( # open the capture group 1
\s* # zero or more white-spaces
( # a literal parenthesis
[^)]* # zero or more characters that are not a closing parenthesis
remix
[^)]*
)
\) # close the capture group 1
\| # OR
# something else between parenthesis
\s* # note that it is essential that the two branches are able to
# start at the same position. If you remove \s* in the first
# branch, the second branch will always win when there's a space
# before the opening parenthesis.
(\s\?[a-z0-9. ]*)
\1 is the reference to the capture group 1
i makes the pattern case-insensitive
[EDIT]
If you want to do it in a POSIX compliant way, you must use a different approach because several Gnu features are not available, in particular the alternation \| (but also the i modifier, the \s character class, the optional quantifier \?).
This other approach consists to find all eventual characters that are not an opening parenthesis and all eventual substrings enclosed between parenthesis with "remix" inside, followed by eventual white-spaces and an eventual substring enclosed between parenthesis.
As you can see all is optional and the pattern can match an empty string, but it isn't a problem.
All before the parenthesis part to remove is captured in group 1.
sed 's/\(\([^(]*([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*)[^ \t(]*\([ \t]\{1,\}[^ \t(]\{1,\}\)*\)*\)\([ \t]*([^)]*)\)\{0,1\}/\1/g;'
pattern details:
\( # open the capture group 1
\(
[^(]* # all that is not an opening parenthesis
# substring enclosed between parenthesis without "remix"
( [^)]* [Rr][Ee][Mm][Ii][Xx] [^)]* )
# Let's reach the next parenthesis without to match the white-spaces
# before it (otherwise the leading white-spaces are not removed)
[^ \t(]* # all that is not a white-space or an opening parenthesis
# eventual groups of white-spaces followed by characters that are
# not white-spaces nor opening parenthesis
\( [ \t]\{1,\} [^ \t(]\{1,\} \)*
\)*
\) # close the capture group 1
\(
[ \t]* # leading white-spaces
([^)]*) # parenthesis
\)\{0,1\} # makes this part optional (this avoid to remove a "remix" part
# alone at the end of the string)
The word boundaries in this mode aren't available too. So the only way to emulate them is to list the four possibilities:
([Rr][Ee][Mm][Ii][Xx]) # poss1
([Rr][Ee][Mm][Ii][Xx][^a-zA-Z][^)]*) # poss2
([^)]*[^a-zA-Z][Rr][Ee][Mm][Ii][Xx]) # poss3
([^)]*[^a-zA-Z][Rr][Ee][Mm][Ii][Xx][^a-zA-Z][^)]*) # poss4
and to replace ([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*) with:
\(poss1\)\{0,\}\(poss2\)\{0,\}\(poss3\)\{0,\}\(poss4\)\{0,\}
Just skip the lines matching "remix":
sed '/([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*)/! s/([^)]*)//g'
where bracket are (US) :[]
sed '/remix\|REMIX\|Remix/ !s/\[[^]]*]//g'
where bracet (ROW): ()
sed '/remix\|REMIX\|Remix/ !s/([^)]*)//g'
assuming:
- there is no internal bracket
- Other form of remix are excluced (ReMix, ...), so line is deleted
- Remix could be any place in title (i love remix) [if needed specify which to take and remove]

Perl Regex match balanced parentheses

Following strings - match:
"MNO(A=(B=C) D=(E=F)) PQR(X=(G=H) I=(J=(K=L)))" - "MNO"
"MNO(A=(B=C) D=(E=F))" - "MNO"
"MNO" - "MNO"
"RAX.MNO(A=(B=C) D=(E=F)) PQR(X=(G=H) I=(J=(K=L)))" - "RAX.MNO"
"RAX.MNO(A=(B=C) D=(E=F))" - "RAX.MNO"
"RAX.MNO" - "RAX.MNO"
Inside every brace, there can be unlimited groups of them, but they have to be closed properly.
Any ideas? Don't know how to test properly for closure.
I have to use a Perl-Regular-Expression.
In Perl or PHP, for example, you could use a regex like
/\((?:[^()]++|(?R))*\)/
to match balanced parentheses and their contents.
See it on regex101.
To remove all those matches from a string $subject in Perl, you could use
$subject =~ s/\((?:[^()]++|(?R))*\)//g;
Explanation:
\( # Match a (
(?: # Start of non-capturing group:
[^()]++ # Either match one or more characters except (), don't backtrack
| # or
(?R) # Match the entire regex again, recursively
)* # Any number of times
\) # Match a )

PCRE (recursive) pattern that matches a string containing a correctly parenthesized substring. Why does this one fail?

Well, there are other ways (hmmm... or rather working ways) to do it, but the question is why does this one fail?
/
\A # start of the string
( # group 1
(?: # group 2
[^()]* # something other than parentheses (greedy)
| # or
\( (?1) \) # parenthesized group 1
) # -group 2
+ # at least once (greedy)
) # -group 1
\Z # end of the string
/x
Fails to match a string with nested parentheses: "(())"
It doesn't fail
$ perl junk.pl
matched junk >(())<
$ cat junk.pl
my $junk = qr/
\A # start of the string
( # group 1
(?: # group 2
[^()]* # something other than parentheses (greedy)
| # or
\( (?1) \) # parenthesized group 1
) # -group 2
+ # at least once (greedy)
) # -group 1
\Z # end of the string
/x;
if( "(())" =~ $junk ){
print "matched junk >$1<\n";
}
Wow!.. Thank you, junk! It really works... in Perl. But not in PCRE. So, the question is mutating into "What's the difference between Perl and PCRE regex pattern matching?"
And voila! There is an answer:
Recursion difference from Perl
In PCRE (like Python, but unlike Perl), a recursive subpattern call is
always treated as an atomic group. That is, once it has matched some of
the subject string, it is never re-entered, even if it contains untried
alternatives and there is a subsequent matching failure.
Therefore, we just need to swap two subpatterns:
/ \A ( (?: \( (?1) \) | [^()]* )+ ) \Z /x
Thank you!