Find every string between quotes inside specific curly braces - regex

I'm working on a script helping me catching all my localized strings in my project and I'm stuck on a RegEx.
In the following string {{ translateAttr title="button.add_user.title" data-disable-with="button.add_user.disabled" }} I'd like to be able to catch "button.add_user.title" and "button.add_user.disabled" because my curly braces starts with translateAttr attribute.
So far, I've come with this rule \{{2}translateAttr .*=(['"])(.+?)(?=(?<!\\)\1)\1 ?}{2} but as you could see here http://lumadis.be/regex/test_regex.php?id=2362 it is not matching all the occurrences.
A little help here would be much appreciated.
EDIT: Patterns I'd like the regex to match too
{{ translateAttr title="button with \"escaped quotes\" here" data-disable-with='button with single quotes' }}
{{ translateAttr title="want this with ' single quote" "but not this one" }}
EDIT2 : Patterns I don't like to match
{{ title="because no translateAttr" }}
{{ translateAttr "because no something= attribute before" }}
Thanks

Use the below regex which uses the PCRE verb (*SKIP)(*F)
^(?!.*?\{{2}\h*translateAttr).*(*SKIP)(*F)|=(["'])((?:\\\1|(?!\1).)*)\1
DEMO
Explanation

Related

Regex Substitution to add a class to a specific html tag

I need to do a regex find-replace on the post content of a wordpress site in order to change all existing <h4> tags to <h2> tags. I then need to style the <h4> tags to look like <h2> tags.
My plan was to add a class to the new <h2> tags...
<h4> Some poorly written html </h4>
becomes
<h2 class="pseudo-h4"> Some poorly written html </h2>
I feel like this should be doable with regex, but I just cannot seem to grok the more advanced parts of regex. My current working approach is to use this regex (?<=h4)(.+class=") to capture the 'class=' part of any h4 opening tag and then use $1pseudo-h4 as the substitution string. Once that is done I can go back and replace all h4s without regex because those which are "pseudo-h4s" will already be marked by the class.
I have a few problems...
1 - wp-cli is hanging when I try to run this on wp_posts. Maybe this is normal?
2 - $1pseudo-h4 with a space on the end is needed prevent my class from concatenating with the next class, but when i pass the argument with a space on the end i get "unknown --regex  parameter"
3 - In my tester it worked, but I dont actually know why this pattern wont match the tag of a previous element, for instance...
<h4>Sup<h4><p class="extra-cheese">Bla bla<p>
my lookbehind should see the <h4> and .+ should go through as many characters as it needs to hit the "class=" section right?

regex issue: everything but a complex string

I am building a very basic template engine. My template is super simple:
...html code before...
{{ foreach apples }}
... html code to be repeated {{apple}} ...
{{ endforeach }}
{{ foreach oranges }}
... html code to be repeated {{orange}} ...
{{ endforeach }}
...html code after ...
My goal is to get the first foreach (apples) and I've arrived here: https://regex101.com/r/cD5gY4/2
Does anybody have an idea about how I could stop to the end of the first loop instead of capturing both?
.* is greedy, make it non-greedy by adding a quantifier like ? to it. Try this:
{{\s*foreach ([a-z]+)\s*}}(.*?)({{\s*endforeach\s*}})
REGEX101

Balanced regular expression

so I got my hands on regular expressions and tried to match the outer {% tag xyz %}{% endtag %} tags of the following text using regular expressions:
{% tag xyz %}
{% tag abc %}
{% endtag %}
{% endtag %}
My regular expression looks as follows and works so far:
({%)\s*(tag)([^%}]*?)(?:\s*(?:(%})((?:(?:[^{%]*?)|(?R))*)(?:({%)\s*(end\2)\s*(%}))))
But whenever the text inside of the matching tags contains a single { or % sign, the regex won't work as expected. I think it's because of the character classes that may match {% but also { or % as single characters too. I tried a lot and ended up with try and error but without success.
Any help on that issue?
I setup two regex101 links for you to show the issue:
works: https://regex101.com/r/qH0rI5/1
does not work: https://regex101.com/r/qH0rI5/2
Any help is really appreciated!
Try to to replace [^{%] with (?:(?!{%).) and add the s (PCRE_DOTALL) flag:
This would allow { that are not followed by % in between by using a negative lookahead.
Test your updated pattern or here another start to try with:
/{% tag \w+ %}(?:(?:(?!{%).)|(?0))*{% endtag %}/gs
test at regex101

Regex, I just don't get it right

I just don't get my Regex right:
I have the following template:
<!-- Defines the template for the tabs. -->
{{TMPL:Import=../../../../Data/Templates/Ribbon/tabs.tmpl; Name=Tabs}}
<div class="tabs">
<ul role="tablist">
{{BOS:Sequence}}
<li role="tab" class="{{TabType}}" id="{{tabId}}">
<span>{{TabFile}}</span>
</li>
{{EOS:Sequence}}
</ul>
</div>
{{Render:Tabs}}
I would like to find everything between {{}} except the tags that begins with {{BOS, {{EOS, {{TMPL, {{Render
Here are a couple approaches:
Attempt 1:
({{).*(}})
This selects everything between {{ }} tags, which is not good.
Attempt 2:
({{)[^TMPL][^BOS][^EOS][^Render].*(}})
This will make that {{TabType}} and {{TabFile}} are not selected anymore and I just don't know why.
With some other regex, I get that {{TabType}}" id="{{tabId}} is selected as one match.
Does anyone have a clue on how to solve this, I really need a regex Guru :-)
You can use negative lookahead based regex like this:
{{(?!TMPL|[BE]OS|Render).*?}}
RegEx Demo
You have to use the following regex to get the content between braces:
\{\{(.*?)\}\}
Working Demo
If you want to exclude the content from the comment you posted you can use a regex technique to exclude what you don't want and keep what you want at the end of the regex:
\{\{BOS:Sequence\}\}|\{\{EOS:Sequence\}\}|\{\{TMPL:Import.*?\}\}|\{\{Render:Tabs\}\}|\{\{(.*?)\}\}
Working demo
By the way, if you want to have a shortcut for above regex you can use:
\{\{(?:BOS|EOS):Sequence\}\}|\{\{TMPL:Import.*?\}\}|\{\{Render:Tabs\}\}|\{\{(.*?)\}\}
This is a very useful technique for pattern exclusion that I glad to learn it from Anubhava and zx81 (they rock using regex pattern). For this regex technique you can find the content you need using capturing groups (check the green highlights on the screenshot below):
Using [^TMPL] and the like won't work because these are character classes. You could use a negative lookahead, though (or even lookbehind depending upon the regex library you are using).
\{\{(?!BOS:)(?!EOS:)(?!Render:)(?!TMPL:)(.*?)\}\}
Still I get the feeling that you want the BOS, EOS, etc. to just be strings in the template with {{ and other values to be interpolated. If you are using handlebars or something, you can have strings interpolated:
{{'{{BOS:Sequence}}'}}

Outputting Literal curly braces in Liquid templates

I'm trying to output the following from within a liquid template:
{{ example }}
Obviously, Liquid sees this as a variable named example and tries to do substitution. I'm trying to find out how I can output the actual braces.
So far, I've found one method that works, but it's incredibly ugly:
{{ '{example'|prepend:'{' }}}}
Yeah, told you it was gross.
Here are other things I've tried:
{{{ example }}} # outputs '}'
{{{{ example }}}} # outputs '}}'
\{\{ example \}\} # outputs '\{\{ example \}\}'
Any advice here?
You can also use raw:
{% raw %}
...lots of liquid code goes here and it doesn't get interpreted...
{% endraw %}
What about using the numeric HTML entities { and } for { and } respectively - presumably this is to be output as HTML?
EDIT: Forgive me, I'm not too familiar with liquid (so this might be very wrong), but can you assign your {{ example }} special value to a variable and output that? May be something like:
{% assign special = '{{ example }}' %}
{{ special }}
This is the only thing that worked from me. Lifted from here:
{{ "{{ this " }}}}
I needed this because I wanted to reference the site global variable from inside a mustache template.
I wanted to have both curly brackets AND angle brackets when formatting a fenced code block, so I ended up with the following pattern:
{% capture code %}{% raw %}line 1
line 2
line 3
{% endraw %}{% endcapture %}
<pre><code>{{ code | replace: "<", "<" | replace: ">", ">" }}</code></pre>
You can escape the HTML, for example in a {{var}} you can use \{\{var\}\}, so that way luquid don't process it.