Regular Expressions ungreedy in bash [duplicate] - regex

This question already has answers here:
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 6 years ago.
Choose which of the following strings match regular expression
(1 U 22)*2*
a. 22112222112211
b. 11112
c. The empty string.
d. 12121
e. 1121111222
I did a few search, U means " Ungreedy. Makes the quantifiers *+?{} consume only those characters absolutely necessary to form a match, leaving the remaining ones available for the next part of the pattern. When the "U" option is not in effect, an individual quantifier can be made non-greedy by following it with a question mark. Conversely, when "U" is in effect, the question mark makes an individual quantifier greedy. " https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
but I totally don't understand it, what does greedy regular expression and ungreedy regular expression mean? and can you show the example that I listed above?

Greedy means that it will try to find the longest matching string.
For the following string:
{ this} is a { test} }
Example of a Greedy regex
\{.*\}
This regex would match the whole following text:
{ this} is a { test} }
Non Greedy
\{.*\}
would match only
{ this}

Related

Regular Expression for anything in Between ${something} [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am a newbie in regular expression, I have written regular expression for ${serviceName} basicly I want to take the words in between ${ } So I already wrote regular expression for this that is perfectly fine
"\\$\\{(\\w+)\\}"
But what I want to take any values not only the words which are in between ${serviceName.1.Type}.So can you guys help me with regular expression for ${serviceName.1.Type}.
I hope my question is clear.
Thanks In Advance.
A good place to test regular expressions is https://regex101.com/
\w+ matches any word character (equal to [a-zA-Z0-9_])
If you want to match anything you can replace it with: .*
.* matches any character (except for line terminators)
You might want to add a "?" at the end to match to first "}"
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed
Also you don't need to escape the { } in this case
So what you want is:
"\\${(.*?)}"
\$\{([\w?\.?\d?\s?]+)\}
This expression captures as a group everything that appears between {}
You can then call the group with the expression $1
On this web you will see your exercise solved and if other expressions have some additional character you can try to add it. Now it is prepared for points \. , spaces \s, letters \w and digits \d

Regular Expression lazy modifier matches too much [duplicate]

This question already has answers here:
Regular expressions: Ensuring b doesn't come between a and c
(4 answers)
Closed 4 years ago.
The following regular expression is jumping [url] tags...
Regular Expression (generic regular expression)
(?:\[url.*?\])(.*?youtu.*?)(?:\[\/url\])
String:
[url]blahyoutubeblah[/url] heyya [url]blahblah[/url] [url]www.youtube.com/blah[/url]
Help!!
Your captured group requires youtu inside, so the substring
[url]blahblah[/url] [url]www.youtube.com/blah[/url]
matches, because it starts with [url], includes youtu, and ends with [/url].
Simply using a negated character set, excluding [, probably isn't enough, because that wouldn't allow for nested tags to match, such as an input of
[url]foobar youtube[b]BOLD TEXT[/b][/url]
You might require negative lookahead for [/url] right before each repeated character:
(?:(?!\[\/url\]).)*
Also, make sure that whatever comes after the [url does not contain ]s before coming to the true ], with:
\[url[^]]*\]
In full:
\[url[^]]*\]((?:(?!\[\/url\]).)*youtu(?:(?!\[\/url\]).)*)\[\/url\]
There's no need to make the quantifiers lazy anymore, because of the negative lookahead.
Demo:
https://regex101.com/r/hSAJEp/1
You are matching .* which means it will match url, up until youtu, then find /url
A simple workaround could be something like which means it won't match a opening [ bracket before finding youtu
(?:\[url.*?\])([^\[]*?youtu.*?)(?:\[\/url\])
The problem was that there is youtu you had in your regex but there was blahblah between url to be matched, making it generic
so
(?:\[url.*?\])(.*?)(?:\[\/url\])
It's lazy, but it still will match if it can - it won't be moving left border if match is possible. There are other things to do that. One of them is just to prevent unwanted match by regex itself - just use
(?:\[url[^\]]*?\])([^\[]*?youtu.*?)(?:\[\/url\])

Matching pattern multiple times in same string with regex [duplicate]

This question already has an answer here:
Finding the indexes of multiple/overlapping matching substrings
(1 answer)
Closed 7 years ago.
I'm trying to find all matches of a particular pattern "8ab|ab8" in the string "8ab8". So I tried the R command gregexpr("8ab|ab8","8ab8") hoping to get a return vector with the starting positions as c(1,2).
Unfortunately, it seems that what happens is that once the first pattern is matched, that portion of the string is "removed" and the second pattern won't be matched.
For example, once "8ab" is matched, "8ab8" becomes "8" and when R tries matching "ab8" in "8", the pattern won't be found. I know this because gregexpr("8ab|ab8","8ab ab8") works fine and returns starting positions of pattern matches as c(1,5).
The question is, how do I match the same pattern multiple times in the first case?
Use perl regular expressions: perl=TRUE . (see ?regex for info on perl regular expressions)
gregexpr("(?=8ab)|(?=ab8)","8ab8",perl=T)

Perl Regular expression look ahead [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
What is the use of
?=
in perl regex
please tell the exact meaning and give some regex example.
(?=...)
is a positive lookahead, a type of zero-width assertion. What it's saying is that the match must be followed by whatever is within the parentheses but that part isn't captured.
Example:
.*(?=bar)
This pattern matches all the characters upto the string bar. When bar is detected then it stops matching. If a line contains more than one bar means it matches upto the last bar because .* does a greedy match.
DEMO

regex that matches everything except a constant [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 8 years ago.
I need a regexp that will match everything except a single constant (case ignored)
Example for constant ALL, should match words like: dog, MOUSE, mall, alligator. But it shouldn't match: all, ALL, alL.
(?si)^(?!all$).*
will match any string except all (case-insensitively).
(?i) makes the regex case-insensitive, (?s) allows the dot to match any character, including newlines. If you don't expect newlines in your input, you can remove the s.
See it live on regex101.com.