Regex get all before first occurrence of character - regex

I know it's been asked many many times. I tried my best but the result wasn't perfect.
Regex
/(\(\s*["[^']*]*)(.*\/logo\.png.*?)(["[^']*]*\s*\))/gmi
Regex101 Link: https://regex101.com/r/0f8Q08/1
It should capture all separately.
(../asdasd/dasdas/logo.png)
(../asdasd/dasdas/logo.png)
( '../logo.png' )
Right now it's capturing as a whole.
(../asdasd/dasdas/logo.png) (../asdasd/dasdas/logo.png) ( '../logo.png' )
What I need is, the regex to stop after the first closing bracket ) match.

You can use
(\(\s*(["']?))([^"')]*\/logo\.png[^"')]*)(\2\s*\))
See the regex demo.
Details
(\(\s*(["']?)) - Group 1: (, any zero or more whitespaces, and then Group 2 capturing either a ' or a " optionally
([^"')]*\/logo\.png[^"')]*) - Group 3: any zero or more chars other than ", ' and ), then a /logo.png string, and then again any zero or more chars other than ", ' and )
(\2\s*\)) - Group 4: the same value as in Group 2, zero or more whitespaces, and a ) char.

The issue in your pattern is that the .* matches too much. After the opening parenthesis, you should exclude matching the ( and ) to overmatch the separate parts.
You don't need all those capture groups if you want to match the parts with parenthesis as a whole.
You can use 1 capture group, where the group would be a backreference matching the same optional closing quote.
\(\s*(["']?)[^()'"]*\/logo\.png[^()'"]*\1\s*\)
Regex demo
If you also want the matches without the matching quotes:
\(\s*["']?[^()'"]*\/logo\.png[^()'"]*["']?\s*\)
Regex demo

If you want to use regex you can make the change from .* to [^)] so you stay between parenthesis
(\(\s*["[^']*]*)([^)]*\/logo\.png.*?)(["[^']*]*\s*\))
regex101

Related

Regex doesn't ignore the optionnals groups

I'm trying the create a regex to catch my url and his, optionnals, groups. The regex works fine if the url is complete. The optionnals groups are not optionnals at all.
Regex :
\/(.+)(?:\/(.+))(?:(?:\?(.+)))
Urls to catch :
/taxi
/taxi/lyon
/taxi/lyon?coordinates=7542
https://regex101.com/r/NKFkwq/4/
As you can see, the third line is catched. But i'd like the first and second too.
I thought the ?: will be enought to do that, but i missed something...
Thanks a lot for your help !
Cheers
EDIT and answer
Thanks in the comments for helping me. Here the great regex (the one i expected) : https://regex101.com/r/NKFkwq/8
Indeed ?: is about ignoring a match, not made him optionnal.
Your pattern consists of capturing and non capturing groups. The (?: denotes a non capturing group.
If you want to match all 3 lines, you could use match the part starting from the first forward slash and make the part starting from the second forward slash optional.
^/[^\s/]+(?:/[^\s/]+)?$
^ Start of string
/[^\s/]+ Match / and match 1+ times any char except a whitespace or /
(?: Non capturing group
/[^\s/]+ Match / and match 1+ times any char except a whitespace or /
)? Close non capturing group and make it optional
$ End of string
Regex demo
If you want to have capturing groups, but don't want to match /taxi?coordinates=7542 you could nest the groups and make them optional as well.
^/\w+(/\w+(\?\S*)?)?$
^ Start of string
/\w+ Match / and 1+ word chars
( Capture group 1
/\w+ Match / and 1+ word chars
( Capture group 2
\?\S* Match ? and 0+ times a non whitespace char
)? Close group 2
)? Close group 1
$ End of string
Regex demo

Regex to remove all parentheses except most external ones

I have been trying and reading many similar SO answers with no luck.
I need to remove parentheses in the text inside parentheses keeping the text. Ideally with 1 regex... or maybe 2?
My text is:
Alpha (Bravo( Charlie))
I want to achieve:
Alpha (Bravo Charlie)
The best I got so far is:
\\(|\\)
but it gets:
Alpha Bravo Charlie
You can use a regex like this:
(\(.*?)\((.*?)\)
With this replacement string:
$1$2
Regex demo
Update: as per ııı comment, since I don't know your full sample text I provide this regex in case you have this scenario
(\([^)]*)\((.*?)\)
Regex demo
From your post and comments, it seems you want to remove only the inner most parenthesis, for which you can use following regex,
\(([^()]*)\)
And replace with $1 or \1 depending upon your language.
In this regex \( matches a starting parenthesis and \) matches a closing parenthesis and ([^()]*) ensures the captured text doesn't contain either ( or ) which ensures it is the innermost parenthesis and places the captured text in group1, and whole match is replaced by what got captured in group1 text, thus getting rid of the inner most parenthesis and retaining the text inside as it is.
Demo
Your pattern \(|\) uses an alternation then will match either an opening or closing parenthesis.
If according to the comments there is only 1 pair of nested parenthesis, you could match:
(\([^()]*)\(([^()]*\)[^()]*)\)
( Start capturing group
\( Match opening parenthesis
[^()]* Match 0+ times not ( or )
) Close group 1
\( Match
( Capturing group 2
\([^()]*\) match from ( till )
[^()]* Match 0+ times not ( or )
) close capturing group
\) Match closing parenthesis
And replace with the first and the second capturing group.
Regex demo

How to do a find replace around some function call

I have a lot of calls in lots of different files to os.getenv('some_var'). I would like to replace all of these with os.environ['some_var'].
I know how to replace all instances of os.getenv with os.environ but not how to replace the (.*) with [.*] without loosing the text inside.
Try this regex:
(os\.)[^()]*\(([^()]*)\)
Replace each match with \1environ[\2]
Click for Demo
Explanation:
(os\.) - matches os. and capture in group 1
[^()]*\( - matches 0+ occurrences of any character that is neither a ( nor ) follwed by (
([^()]*) - matches 0+ occurrences of any character that is neither a ( nor ). This substring is captured in Group 2
\) - matches )
You can match the text and capture the text inside parenthesis using this regex,
os.getenv\('([^']+)'\)
And replace it with os.environ['\1']
This regex basically has three parts,
os.getenv\(' - This literally matches os.getenv('
([^']+) - This captures whatever text is there in parenthesis and captures it in group1
'\) - This literally matches ')
Demo

Repeated capturing group PCRE

Can't get why this regex (regex101)
/[\|]?([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures all the input, while this (regex101)
/[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures only |Func
Input string is |Func(param1, param2, param32, param54, param293, par13am, param)|
Also how can i match repeated capturing group in normal way? E.g. i have regex
/\(\(\s*([a-z\_]+){1}(?:\s+\,\s+(\d+)*)*\s*\)\)/gui
And input string is (( string , 1 , 2 )).
Regex101 says "a repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations...". I've tried to follow this tip, but it didn't helped me.
Your /[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g regex does not match because you did not define a pattern to match the words inside parentheses. You might fix it as \|+([a-z0-9A-Z]+)(?:\(?(\w+(?:\s*,\s*\w+)*)\)?)?\|?, but all the values inside parentheses would be matched into one single group that you would have to split later.
It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer.
What you may do is get mutliple matches with preg_match_all capturing the initial delimiter.
So, to match the second string, you may use
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\()\K\w+
See the regex demo.
Details:
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\() - either the end of the previous match (\G(?!\A)) and a comma enclosed with 0+ whitespaces (\s*,\s*), or 1+ | symbols (\|+), followed with 1+ alphanumeric chars (captured into Group 1, ([a-z0-9A-Z]+)) and a ( symbol (\()
\K - omit the text matched so far
\w+ - 1+ word chars.

Could someone explain the regex /(.*)\.(.*)/?

I want to get the file extension in Groovy with a regex, for let's say South.6987556.Input.csv.cop.
http://www.regexplanet.com/advanced/java/index.html shows me that the second group would really contain the cop extension. Which is what I want.
0: [0,27] South.6987556.Input.csv.cop
1: [0,23] South.6987556.Input.csv
2: [24,27] cop
I just don't understand why the result won't be
0: [0,27] South.6987556.Input.csv.cop
1: [0,23] South
2: [24,27] 6987556.Input.csv.cop
What should be the regex to get this kind of result?
Here is a visualization of this regex
(.*)\.(.*)
Debuggex Demo
in words
(.*) matches anything als large as possible and references it
\. matches one period, no reference (no brackets)
(.*) matches anything again, may be empty, and references it
in your case this is
(.*) : South.6987556.Input.csv
\. : .
(.*) : cop
it isn't just only South and 6987556.Input.csv.cop because the first part (.*) isn't optional but greedy and must be followed by a period, so the engine tries to match the largest possible string.
Your intended result would be created by this regex: (.*?)\.(.*). The ? after a quantifier (in this case *) switches the behaviour of the engine to ungreedy, so the smallest matching string will be searched. By default most regex engines are greedy.
To get the desired output, your regex should be:
((.*?)\.(.*))
DEMO
See the captured groups at right bottom of the DEMO site.
Explanation:
( group and capture to \1:
( group and capture to \2:
.*? any character except \n (0 or more
times) ? after * makes the regex engine
to does a non-greedy match(shortest possible match).
) end of \2
\. '.'
( group and capture to \3:
.* any character except \n (0 or more
times)
) end of \3
) end of \1