Balanced regular expression - regex

so I got my hands on regular expressions and tried to match the outer {% tag xyz %}{% endtag %} tags of the following text using regular expressions:
{% tag xyz %}
{% tag abc %}
{% endtag %}
{% endtag %}
My regular expression looks as follows and works so far:
({%)\s*(tag)([^%}]*?)(?:\s*(?:(%})((?:(?:[^{%]*?)|(?R))*)(?:({%)\s*(end\2)\s*(%}))))
But whenever the text inside of the matching tags contains a single { or % sign, the regex won't work as expected. I think it's because of the character classes that may match {% but also { or % as single characters too. I tried a lot and ended up with try and error but without success.
Any help on that issue?
I setup two regex101 links for you to show the issue:
works: https://regex101.com/r/qH0rI5/1
does not work: https://regex101.com/r/qH0rI5/2
Any help is really appreciated!

Try to to replace [^{%] with (?:(?!{%).) and add the s (PCRE_DOTALL) flag:
This would allow { that are not followed by % in between by using a negative lookahead.
Test your updated pattern or here another start to try with:
/{% tag \w+ %}(?:(?:(?!{%).)|(?0))*{% endtag %}/gs
test at regex101

Related

Translating a working regex for Twig match

I've got a regex working that I'm unable to use with Twig match. Basically, I need to confirm if a message contains "#firstname lastname " in it anywhere. The message can have returns, multiple matches, etc. The regex must match the entire message as long as "#firstname lastname " exists in it somewhere.
This is the regex I got working here: https://regexr.com/
/.+?(?=#)#[A-z]+ [A-z]+ (.*)/gs
And it worked for: This is a bunch of text surrounding #firstname lastname and so forth
What I tried:
{%- set action -%}
Twig needs to handle the set in this format #firstname lastname
for what I need to work
{%- endset -%}
And:
{% set regex = '/.+?(?=#)#[A-z]+ [A-z]+ (.*)/gs' %}
Then:
{% if action matches regex %}
Working
{% else %}
Not working
{% endif %}
What I get is: Not Working
There is no space between lastname and for, also g is not a valid modifier in PHP
{% set regex = '/.+?(?=#)#[A-z]+ [A-z]+(.*)/s' %}
demo

Regex for matching a tag in a Liquid template : ">" inside html tag

I have to write a match pattern for the body tag in a Liquid template.
While matching HTML tags is pretty straightforward, I have the problem that HTML special character can be used inside the Liquid code.
Example:
<body class="template-{{ template | replace: '.', ' ' | truncatewords:
1, '' }}{% if promo %}has-promo{% endif %} {% if products.size > 1
%}has-related-products{% endif %} {% if settings.product-hover ==
'quick-shop' %}has-quick-shop{% endif %} loading" >
or simplified:
<body {% bla > 1 %} bla bla>
My current match pattern /<body(.[^>]*)>/s matches the above code until the first >. I need a pattern that matches the whole tag.
Try with:
/<body(.[^>{}]*(?:{+[^}]*}+[^>{}]*)*)>/s
See demo
Instead of [^>]* the regex uses [^>{}]*(?:{+[^}]*}+[^>{}]*)*, that matches any character but >, { or }; at some point it can meet a {, so it matches the whole content of {+something}+, followed again by [^>{}]*. This trick is repeated many times with the last *.

How to find out if part of the string is in quotes?

So I have this pattern:
{% url my_view %}
{% url my_view user_id %}
And this is bad, so instead it should be like this:
{% url 'my_view' %}
{% url 'my_view' user_id %}
So the 'my_view' part should always be in quotes. All I need to do Is I need to find all the occurrences where 'my_view' part is not in quotes.
How can I do it?
Use lookarounds:
(?<={% url )[^'\s]*(?!')
See it in action
You can replace the result of following regex :
/\{% url (\S+)(.*)%\}/
With :
/\{% url \'\1\'\2%\}/
Or in some regex engines you may need to use $1 instead of \1.
You can use the following to match:
(?<=\{% url )(\S+)
And replace with:
'$1'
See DEMO
Edit: If you already have some my_view's with quotes use the following:
(?<=\{% url )([^']\S+[^'])(?=\s)
See DEMO

Reg exp: string NOT in pattern

I have problems constructing a reg exp. I think I should use lookahead/behind but I just don't make it.
I want to make a reg-exp that catches all HTML tags that do NOT contain a string ('rabbit').
For example, the following tags should be matched
<a XXX> <span yyy> </div x zz> </li qwerty=ab cd> <div hello=stackoverflow>
But not the following
<a XXrabbitX> <span yyyrabbit> </div xrabbitzz> </li rabbit=abcd hippo=9876> <div hello=rabbit>
(My next step is to make make a substitution so that the word rabbit enters the tags, but that will hopefully come easy.)
(I use PHP5-preg_replace.)
Thanks.
I guess you're matching the HTML tags with a regex something like this:
/<[^>]*>/
You can add a negative look-ahead assertion in there to assert that "rabbit" cannot be found in the tag:
/<(?![^>]*rabbit)[^>]*>/

Outputting Literal curly braces in Liquid templates

I'm trying to output the following from within a liquid template:
{{ example }}
Obviously, Liquid sees this as a variable named example and tries to do substitution. I'm trying to find out how I can output the actual braces.
So far, I've found one method that works, but it's incredibly ugly:
{{ '{example'|prepend:'{' }}}}
Yeah, told you it was gross.
Here are other things I've tried:
{{{ example }}} # outputs '}'
{{{{ example }}}} # outputs '}}'
\{\{ example \}\} # outputs '\{\{ example \}\}'
Any advice here?
You can also use raw:
{% raw %}
...lots of liquid code goes here and it doesn't get interpreted...
{% endraw %}
What about using the numeric HTML entities { and } for { and } respectively - presumably this is to be output as HTML?
EDIT: Forgive me, I'm not too familiar with liquid (so this might be very wrong), but can you assign your {{ example }} special value to a variable and output that? May be something like:
{% assign special = '{{ example }}' %}
{{ special }}
This is the only thing that worked from me. Lifted from here:
{{ "{{ this " }}}}
I needed this because I wanted to reference the site global variable from inside a mustache template.
I wanted to have both curly brackets AND angle brackets when formatting a fenced code block, so I ended up with the following pattern:
{% capture code %}{% raw %}line 1
line 2
line 3
{% endraw %}{% endcapture %}
<pre><code>{{ code | replace: "<", "<" | replace: ">", ">" }}</code></pre>
You can escape the HTML, for example in a {{var}} you can use \{\{var\}\}, so that way luquid don't process it.