Regular Expressions (pcre) for shortcode/bbcode - regex

I have a regex (see on https://regex101.com/r/mB7vQ8/2):
/\[content_box((.*?)!?\])(.*?)\[\/content_box\]/ig
for match all [content_box] (with or without tag parameters) in a text like:
[content_boxes foo=bar][content_box baz=foo]text[/content_box][/content_boxes]
[content_box]text[/content_box]
[content_box foo=bar]text[/content_box]
My regex work, but if [content_box] is included in a [content_boxes] the rule fails the match (in strong):
[content_boxes foo=bar][content_box baz=foo]text[/content_box][/content_boxes]
[content_box]text[/content_box]
[content_box foo=bar]text[/content_box]
the expected match is:
[content_boxes foo=bar][content_box baz=foo]text[/content_box][/content_boxes]
[content_box]text[/content_box]
[content_box foo=bar]text[/content_box]
see online https://regex101.com/r/mB7vQ8/2
How solve it?

You can use this regex with word boundaries:
~\[content_box\b\s*([^]]*)\](.*?)\[/content_box\]~
RegEx Demo
Here content_box\b will not match content_boxes and match will always be inner [content_box ..] tag.

Related

Regex - How to match on all text inside of ${}

I need a regular expression to match all text inside of ${}.
i.e:
For the following case:
Hello, {does this work ${yes} I hope} so and this is another ${case}
It should match on "yes" and "case".
Use the following pattern:
\$\{(.*?)\}
Or, if your regex engine/flavor does not support capture groups but does support lookarounds, you may use:
(?<=\$\{).*?(?=\})

Notepad++ Regex to find group of lines with condition

Given this example text:
<abr:rules>
<abr:ruleTypeDefinition>
<abr:code>ABB</abr:code>
<abr:ownership>
<abr:owner organization="NT" application="DCS" subapplication="FM"/>
...lines...
...........
</abr:rules>
<abr:rules>
<abr:ruleTypeDefinition>
<abr:code>ADE</abr:code>
<abr:ownership>
<abr:owner organization="NT" application="DCS" subapplication="CM"/>
...lines...
...........
</abr:rules> (end of group)
I would like to find and remove all that goes from <abr:rules> to </abr:rules> with the condition that subapplication IS NOT "CM". Organization and application are the same, <abr:code> it's any string.
What I tried so far is
<abr:rules>\n<abr:ruleTypeDefinition>\n<abr:code>[a-zA-Z0-9]{3,}<\/abr:code>\n<abr:ownership>\n<.*"(FM|PSD|SSC)"\/>\n(?s).*?\n<\/abr:rules>\n
which works but only because I know the other subapplication names.
Is there any way to do it with Regex only ?
Try the following find and replace:
Find:
<abr:rules>((?!subapplication=).)*subapplication="(?!CM")[^"]+"((?!</abr:rules>).)*</abr:rules>
Replace:
(empty string)
Demo
Note: The above pattern will only work if you enable dot in Notepad++ to match newlines. If you don't want to do that, then you may use [\S\s] instead of dot.
You should not use regex for xml, you can read why here:
https://stackoverflow.com/a/1732454/3763374
Instead you can use some parser like Xpath

Using regular expressions to match log file

I am working on a regular expression problem and have run into some issues when I try to match text between certain markers. Below is a regular expression tester with what I have completed so far.
https://regex101.com/r/gE8uQ1/1
I am trying to select the ALL of the query text which appears after "statement: " and before the \nTIMESTAMP. I have used \n\d{4}-d{2}-d{2} to represent the timestamp, but it will not select the whole query. Why is this happening? Is it because of my modifiers?
(?<=statement: )([ _\-|0-9,:;\.=A-Za-z\(\)"\n\t']+?)(?=(?:\d{4}-\d{2}-\d{2}|$))
Try this.See demo.Just change your negative lookahead to positive lookahead and add quantifier to character class.
See demo.
https://regex101.com/r/gE8uQ1/5
You can use the following with g and s (because your querys have new lines which are not matched by .) modifiers:
(?<=statement: )([ _\-|0-9,:;\.=A-Za-z\(\)"\n\t'].+?)(?=\d{4}-\d{2}-\d{2}|$)
^ ^ ^
See DEMO

Multiple max lengths in a regular expression

I have the following regular expression:
[0-9]{7}-[0-9]{1}$
I should be able to match the following patterns:
1234567-8
3142539-1
But not the following:
12345678-1
1234567-12
Currently my regex matches 12345678-1 but not 1234567-12 (in JavaScript). Both should fail. What am I doing wrong?
Your pattern would match any string that ends($) with [0-9]{7}-[0-9]{1} and so it would match those inputs..
Use ^(start of the string) to specify that you want to match exactly..
^[0-9]{7}-[0-9]{1}$

php regex to match three words if not then two and then one

Q1: I'm writing a regex in php and not successful. I want to match the following:
so i would
if not then match:
so i
and then:
i would
and
so
i
would
Here is my code:
\b(so i|i would|so i would|(so|i|would))\b
Its only matching the: so, i, would, so i, i would .... but not matching the so i would?
Order your regex correctly.
\b(so i would|so i|i would|(so|i|would))\b
Put the longest string to match to the left.
The | is left-associative and hence, in your version Of the regex, is matching the shorter string.
Just put it at the beginning
\b(so i would|so i|i would|(so|i|would))\b
put longest pattern to left in the group: \b(long|...|short)\b
another solution: \b(so i would|i would|would|so i|so|i)\b
p.s. this is NFA regex engine feature, please refer to "Mastering Regular Expressions"