Regex to remove all parentheses except most external ones - regex

I have been trying and reading many similar SO answers with no luck.
I need to remove parentheses in the text inside parentheses keeping the text. Ideally with 1 regex... or maybe 2?
My text is:
Alpha (Bravo( Charlie))
I want to achieve:
Alpha (Bravo Charlie)
The best I got so far is:
\\(|\\)
but it gets:
Alpha Bravo Charlie

You can use a regex like this:
(\(.*?)\((.*?)\)
With this replacement string:
$1$2
Regex demo
Update: as per ııı comment, since I don't know your full sample text I provide this regex in case you have this scenario
(\([^)]*)\((.*?)\)
Regex demo

From your post and comments, it seems you want to remove only the inner most parenthesis, for which you can use following regex,
\(([^()]*)\)
And replace with $1 or \1 depending upon your language.
In this regex \( matches a starting parenthesis and \) matches a closing parenthesis and ([^()]*) ensures the captured text doesn't contain either ( or ) which ensures it is the innermost parenthesis and places the captured text in group1, and whole match is replaced by what got captured in group1 text, thus getting rid of the inner most parenthesis and retaining the text inside as it is.
Demo

Your pattern \(|\) uses an alternation then will match either an opening or closing parenthesis.
If according to the comments there is only 1 pair of nested parenthesis, you could match:
(\([^()]*)\(([^()]*\)[^()]*)\)
( Start capturing group
\( Match opening parenthesis
[^()]* Match 0+ times not ( or )
) Close group 1
\( Match
( Capturing group 2
\([^()]*\) match from ( till )
[^()]* Match 0+ times not ( or )
) close capturing group
\) Match closing parenthesis
And replace with the first and the second capturing group.
Regex demo

Related

Regex get all before first occurrence of character

I know it's been asked many many times. I tried my best but the result wasn't perfect.
Regex
/(\(\s*["[^']*]*)(.*\/logo\.png.*?)(["[^']*]*\s*\))/gmi
Regex101 Link: https://regex101.com/r/0f8Q08/1
It should capture all separately.
(../asdasd/dasdas/logo.png)
(../asdasd/dasdas/logo.png)
( '../logo.png' )
Right now it's capturing as a whole.
(../asdasd/dasdas/logo.png) (../asdasd/dasdas/logo.png) ( '../logo.png' )
What I need is, the regex to stop after the first closing bracket ) match.
You can use
(\(\s*(["']?))([^"')]*\/logo\.png[^"')]*)(\2\s*\))
See the regex demo.
Details
(\(\s*(["']?)) - Group 1: (, any zero or more whitespaces, and then Group 2 capturing either a ' or a " optionally
([^"')]*\/logo\.png[^"')]*) - Group 3: any zero or more chars other than ", ' and ), then a /logo.png string, and then again any zero or more chars other than ", ' and )
(\2\s*\)) - Group 4: the same value as in Group 2, zero or more whitespaces, and a ) char.
The issue in your pattern is that the .* matches too much. After the opening parenthesis, you should exclude matching the ( and ) to overmatch the separate parts.
You don't need all those capture groups if you want to match the parts with parenthesis as a whole.
You can use 1 capture group, where the group would be a backreference matching the same optional closing quote.
\(\s*(["']?)[^()'"]*\/logo\.png[^()'"]*\1\s*\)
Regex demo
If you also want the matches without the matching quotes:
\(\s*["']?[^()'"]*\/logo\.png[^()'"]*["']?\s*\)
Regex demo
If you want to use regex you can make the change from .* to [^)] so you stay between parenthesis
(\(\s*["[^']*]*)([^)]*\/logo\.png.*?)(["[^']*]*\s*\))
regex101

Writing a Regex pattern c++

I need help completing this regex pattern.
Here is the full string:
INSERT((1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449),geography)
Here is the portion of the string I am trying to search for using regex_search:
(1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449)
Here is my regex pattern and code:
regex pattern2("\\(|[0-9]|[a-z]|[A-Z]|\,|\"|");
regex_search (substring,matcher,pattern2);
for(auto x:matcher)
{
substring1 = matcher.suffix().str();
cout << substring1 << endl;
}
substring will output:
1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449),geography)
So not what I need. Would appreciate some help.
To match (1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449) you can use a capturing group and a negated character class.
[^*()\r\n]+\((\([^()\r\n]+\))[^()\r\n]+\)
In parts
[^*()\r\n]+ Match 1+ times any char except the listed
\( Match first opening parenthesis
( Capture group 1
\( Match second opening parenthesis
[^()\r\n]+ Match 1+ times any char except the listed
\) Close second opening parenthesis
) Close group 1
[^()\r\n]+
\) Close first opening parenthesis
Regex demo
You could also make the pattern more restrict by using repeating non capturing groups and use the allowed characters from the character classes that you intended to use:
[a-zA-Z0-9]+\(\(((?:[a-zA-Z0-9]+|"[a-zA-Z0-9]+(?:,? [a-zA-Z0-9]+)+")(?:,(?:[a-zA-Z0-9]+|"[a-zA-Z0-9]+(?:,? [a-zA-Z0-9]+)+"))+)\),[a-zA-Z0-9]+\)
Regex demo

How to do a find replace around some function call

I have a lot of calls in lots of different files to os.getenv('some_var'). I would like to replace all of these with os.environ['some_var'].
I know how to replace all instances of os.getenv with os.environ but not how to replace the (.*) with [.*] without loosing the text inside.
Try this regex:
(os\.)[^()]*\(([^()]*)\)
Replace each match with \1environ[\2]
Click for Demo
Explanation:
(os\.) - matches os. and capture in group 1
[^()]*\( - matches 0+ occurrences of any character that is neither a ( nor ) follwed by (
([^()]*) - matches 0+ occurrences of any character that is neither a ( nor ). This substring is captured in Group 2
\) - matches )
You can match the text and capture the text inside parenthesis using this regex,
os.getenv\('([^']+)'\)
And replace it with os.environ['\1']
This regex basically has three parts,
os.getenv\(' - This literally matches os.getenv('
([^']+) - This captures whatever text is there in parenthesis and captures it in group1
'\) - This literally matches ')
Demo

Repeated capturing group PCRE

Can't get why this regex (regex101)
/[\|]?([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures all the input, while this (regex101)
/[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures only |Func
Input string is |Func(param1, param2, param32, param54, param293, par13am, param)|
Also how can i match repeated capturing group in normal way? E.g. i have regex
/\(\(\s*([a-z\_]+){1}(?:\s+\,\s+(\d+)*)*\s*\)\)/gui
And input string is (( string , 1 , 2 )).
Regex101 says "a repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations...". I've tried to follow this tip, but it didn't helped me.
Your /[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g regex does not match because you did not define a pattern to match the words inside parentheses. You might fix it as \|+([a-z0-9A-Z]+)(?:\(?(\w+(?:\s*,\s*\w+)*)\)?)?\|?, but all the values inside parentheses would be matched into one single group that you would have to split later.
It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer.
What you may do is get mutliple matches with preg_match_all capturing the initial delimiter.
So, to match the second string, you may use
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\()\K\w+
See the regex demo.
Details:
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\() - either the end of the previous match (\G(?!\A)) and a comma enclosed with 0+ whitespaces (\s*,\s*), or 1+ | symbols (\|+), followed with 1+ alphanumeric chars (captured into Group 1, ([a-z0-9A-Z]+)) and a ( symbol (\()
\K - omit the text matched so far
\w+ - 1+ word chars.

Perl greedy regex is not acting greedy

Giving the following code:
use strict;
use warnings;
my $text = "asdf(blablabla)";
$text =~ s/(.*?)\((.*)\)/$2/;
print "\nfirst match: $1";
print "\nsecond match: $2";
I expected that $2 would catch my last bracket, yet my output is:
If .* by default it's greedy why it stopped at the bracket?
The .* is a greedy subpattern, but it does not account for grouping. Grouping is defined with a pair of unescaped parentheses (see Use Parentheses for Grouping and Capturing).
See where your group boundaries are:
s/(.*?)\((.*)\)/$2/
| G1| |G2|
So, the \( and \) matching ( and ) are outside the groups, and will not be part of neither $1 nor $2.
If you need the ) be part of $2, use
s/(.*?)\((.*\))/$2/
^
A regex engine is processing both the string and the pattern from left to right. The first (.*?) is handled first, and it matches up to the first literal ( symbol as it is lazy (matches as few chars as possible before it can return a valid match), and the whole part before the ( is placed into Group 1 stack. Then, the ( is matched, but not captured, then (.*) matches any 0+ characters other than a newline up to the last ) symbol, and places the capture into Group 2. Then, the ) is just matched. The point is that .* grabs the whole string up to the end, but then backtracking happens since the engine tries to accommodate for the final ) in the pattern. The ) must be matched, but not captured in your pattern, thus, it is not part of Group 2 due to the group boundary placement. You can see the regex debugger at this regex demo page to see how the pattern matches your string.