Regular Expression lazy modifier matches too much [duplicate]

Regular Expression lazy modifier matches too much [duplicate] - regex

This question already has answers here:
Regular expressions: Ensuring b doesn't come between a and c
(4 answers)
Closed 4 years ago.
The following regular expression is jumping [url] tags...
Regular Expression (generic regular expression)
(?:\[url.*?\])(.*?youtu.*?)(?:\[\/url\])
String:
[url]blahyoutubeblah[/url] heyya [url]blahblah[/url] [url]www.youtube.com/blah[/url]
Help!!

Your captured group requires youtu inside, so the substring
[url]blahblah[/url] [url]www.youtube.com/blah[/url]
matches, because it starts with [url], includes youtu, and ends with [/url].
Simply using a negated character set, excluding [, probably isn't enough, because that wouldn't allow for nested tags to match, such as an input of
[url]foobar youtube[b]BOLD TEXT[/b][/url]
You might require negative lookahead for [/url] right before each repeated character:
(?:(?!\[\/url\]).)*
Also, make sure that whatever comes after the [url does not contain ]s before coming to the true ], with:
\[url[^]]*\]
In full:
\[url[^]]*\]((?:(?!\[\/url\]).)*youtu(?:(?!\[\/url\]).)*)\[\/url\]
There's no need to make the quantifiers lazy anymore, because of the negative lookahead.
Demo:
https://regex101.com/r/hSAJEp/1

You are matching .* which means it will match url, up until youtu, then find /url
A simple workaround could be something like which means it won't match a opening [ bracket before finding youtu
(?:\[url.*?\])([^\[]*?youtu.*?)(?:\[\/url\])

The problem was that there is youtu you had in your regex but there was blahblah between url to be matched, making it generic
so
(?:\[url.*?\])(.*?)(?:\[\/url\])

It's lazy, but it still will match if it can - it won't be moving left border if match is possible. There are other things to do that. One of them is just to prevent unwanted match by regex itself - just use
(?:\[url[^\]]*?\])([^\[]*?youtu.*?)(?:\[\/url\])

Related

Regular expression no match when followed by character [duplicate]

This question already has an answer here:
Regex match numbers not followed by a hyphen
(1 answer)
Closed 1 year ago.
I am trying to capture groups in a text that only match when the match is not followed by a specific character, in this case the opening parentheses "(" to indicate the start of a 'function/method' rather than a 'property'.
This seems pretty straightforward so I tried:
TEXT
$this->willMatch but $this->willNot()
RESULT
RegExp pattern: \$this->[a-zA-Z0-9\_]+(?<!\()
Expected: $this->willMatch
Actual: $this->willMatch, $this->willNot
RegExp pattern: \$this->[a-zA-Z0-9\_]+[^\(]
Expected: $this->willMatch
Actual: $this->willMatch, $this->willNot
RegExp pattern: \$this->[a-zA-Z0-9]+(?!\()
Expected: $this->willMatch
Actual: $this->willMatch, $this->willNo
My intuition says i need to add ^ and $ but that wont work for multiple occurrences in a text.
Curious to meet the RegExp wizard that can solve this!

Answer from The fourth bird definitely works and it is well explained as well.
As an alternative to using word boundary one can use possessive quantifier i.e. ++ to turn off backtracking thus improving efficiency further.
\$this->\w++(?!\()
RegEx Demo
Please note use of \w instead of equivalent [a-zA-Z0-9_] here.
Like a greedy quantifier, a possessive quantifier repeats the token as many times as possible. Unlike a greedy quantifier, it does not give up matches as the engine backtracks.

The (?<!\() will always be true as the character class does not match a (
Note that you don't have to escape the \_
You can use a word boundary after the character class to prevent backtracking, and turn the negative lookbehind into a negative lookahead (?!\() to assert not ( directly to the right.
\$this->[a-zA-Z0-9_]+\b(?!\()
Regex demo

Regular Expression for anything in Between ${something} [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am a newbie in regular expression, I have written regular expression for ${serviceName} basicly I want to take the words in between ${ } So I already wrote regular expression for this that is perfectly fine
"\\$\\{(\\w+)\\}"
But what I want to take any values not only the words which are in between ${serviceName.1.Type}.So can you guys help me with regular expression for ${serviceName.1.Type}.
I hope my question is clear.
Thanks In Advance.

A good place to test regular expressions is https://regex101.com/
\w+ matches any word character (equal to [a-zA-Z0-9_])
If you want to match anything you can replace it with: .*
.* matches any character (except for line terminators)
You might want to add a "?" at the end to match to first "}"
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed
Also you don't need to escape the { } in this case
So what you want is:
"\\${(.*?)}"

\$\{([\w?\.?\d?\s?]+)\}
This expression captures as a group everything that appears between {}
You can then call the group with the expression $1
On this web you will see your exercise solved and if other expressions have some additional character you can try to add it. Now it is prepared for points \. , spaces \s, letters \w and digits \d

REGEX - find a string that has the same match? [duplicate]

This question already has answers here:
Regex plus vs star difference? [duplicate]
(9 answers)
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 4 years ago.
I am trying to match a string "menu-item" but has a digit after it.
<li id="menu-item-578" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-578">
i can use this regex
menu-item-[0-9]*
however it matches all the menu-item string, i want to only match the "menu-item-578" but not id="menu-item-578"
how can i do it?
thank you

You should avoid using menu-item-[0-9]* not because it matches the same expected substring superfluously but for the reason that it goes beyond that too like matching menu-item- in menu-item-one.
Besides replacing quantifier with +, you have to look if preceding character is not a non-whitespace character:
(?<!\S)menu-item-[0-9]+(?=["' ])
or if your regex flavor doesn't support lookarounds you may want to do this which may not be precise either:
[ ]menu-item-[0-9]+
You may also consider following characters using a more strict pattern:
[ ]menu-item-[0-9]+["' ]

Try it works too:
(\s)(menu-item-)\d+
https://regex101.com/
\s Any whitespace character

Use a space before, like this:
\ menu-item-[0-9]*
The first ocurrence has an " right before, while the second one has a space.
EDIT: use an online regex editor (like Regex tester to try this things.

Regular Expressions - What is the difference between .* and (.*)? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
What is the difference between .* and (.*) in regular expressions?
From what I've seen,
AB.*DE
and
AB(.*)DE
appear to match the same things but I want to know if there are any differences so I use the correct one.
I need to be able to match any number of characters between AB and DE and even match if there isn't anything between them (ABDE).
If .* and (.*) mean the same thing, is there a "better" one to use in terms of standards/best practice?

.* Matches any character zero or more times.
(.*) - Matched characters are stored into a group for later back-referencing(any charcter within () would be captrued).
AB.DE Matches the string ABanycharDE. Dot represent any character except newline character.
AB(.)DE AB and DE are matched and the in-between character is captured.

The parentheses indicate a capture group.

There is no difference. Both will match any character zero+ times. However, the capture group is considered better because it allows you to group together your conditions. This makes your regular expressions look nicer and more readable just like parenthesis in math equations make the equation look nicer.

What's the difference between regex [-+]? and (-|+)? [duplicate]

This question already has answers here:
Using alternation or character class for single character matching?
(3 answers)
Closed 4 years ago.
What's the difference between regex
[-+]?
and
(-|+)?
Don't they mean the same?

Both match same characters. But the second form produce capturing group. You can use backreference to access the group (\1 or $1, .. according to your regular expression engine).
UPDATE
The second form is invalid in many regular expression engines. (valid for some old regular expression engine that match + match literally).
Because + has special meaning: One or more repetitions of preceding pattern, but there's nothing to repeat.

They are same but I would prefer character class (1st form) since 2nd form captures - or + which you may not need.
Even this will be equivalent without capturing the text in the group:
(?:-|+)?

Most regexes can be put in the form of alternation groups and the star operator - for example, [ab]+ can be written as (a|b)(a|b)* - but this is much more verbose, so the other operators exist. You included the question mark operator in your regex, but really [-+]? is equivalent to (+|-|)
So there really is no difference (except for capturing as others have mentioned), but that doesn't mean the other operators aren't useful in making a regex compact and intuitive to understand.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression lazy modifier matches too much [duplicate] - regex

You are matching .* which means it will match url, up until youtu, then find /url A simple workaround could be something like which means it won't match a opening [ bracket before finding youtu (?:\[url.?\])([^\[]?youtu.*?)(?:\[\/url\])

The problem was that there is youtu you had in your regex but there was blahblah between url to be matched, making it generic so (?:\[url.?\])(.?)(?:\[\/url\])

It's lazy, but it still will match if it can - it won't be moving left border if match is possible. There are other things to do that. One of them is just to prevent unwanted match by regex itself - just use (?:\[url[^\]]?\])([^\[]?youtu.*?)(?:\[\/url\])

Related

Regular expression no match when followed by character [duplicate]

Regular Expression for anything in Between ${something} [duplicate]

REGEX - find a string that has the same match? [duplicate]

Regular Expressions - What is the difference between .* and (.*)? [duplicate]

What's the difference between regex [-+]? and (-|+)? [duplicate]

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression lazy modifier matches too much [duplicate] - regex

You are matching .* which means it will match url, up until youtu, then find /url A simple workaround could be something like which means it won't match a opening [ bracket before finding youtu (?:\[url.*?\])([^\[]*?youtu.*?)(?:\[\/url\])

The problem was that there is youtu you had in your regex but there was blahblah between url to be matched, making it generic so (?:\[url.*?\])(.*?)(?:\[\/url\])

It's lazy, but it still will match if it can - it won't be moving left border if match is possible. There are other things to do that. One of them is just to prevent unwanted match by regex itself - just use (?:\[url[^\]]*?\])([^\[]*?youtu.*?)(?:\[\/url\])

Related

Regular expression no match when followed by character [duplicate]

Regular Expression for anything in Between ${something} [duplicate]

REGEX - find a string that has the same match? [duplicate]

Regular Expressions - What is the difference between .* and (.*)? [duplicate]

What's the difference between regex [-+]? and (-|+)? [duplicate]

Categories

Resources

You are matching .* which means it will match url, up until youtu, then find /url A simple workaround could be something like which means it won't match a opening [ bracket before finding youtu (?:\[url.?\])([^\[]?youtu.*?)(?:\[\/url\])

The problem was that there is youtu you had in your regex but there was blahblah between url to be matched, making it generic so (?:\[url.?\])(.?)(?:\[\/url\])

It's lazy, but it still will match if it can - it won't be moving left border if match is possible. There are other things to do that. One of them is just to prevent unwanted match by regex itself - just use (?:\[url[^\]]?\])([^\[]?youtu.*?)(?:\[\/url\])