regex issue for interpolate setting in underscore.js - regex

I have following regex and template for underscore.js templates.
the problem is that the regex for interpolate is not working properly, what will be correct regex?
Regex
var settings = {
evaluate: /\{\{\#([\s\S]+?)\}\}/g,
interpolate: /\{\{[a-zA-Z](.+?)\}\}/g
};
Template
{{# if ( item ) { }}
{{ item.title }}
{{# } }}

The template compiler will use the last capture group in the expressions to build the JavaScript form of the template. In your case, the interpolate will (as noted by Jerry) ignore the first alphabetic character so {{ab}} will end up looking for b in the template parameters. Your regex also doesn't account for leading whitespace. You'd want something more like this:
/\{\{\s*([a-zA-Z](?:.+?))\s*\}\}/g
or better, just leave out the original group instead of converting it to a non-capturing group:
/\{\{\s*([a-zA-Z].+?)\s*\}\}/g
Or even better, get right to the heart of the problem and say exactly what you mean:
/\{\{([^#].*?)\}\}/g
I think the problem that led you to your original regex is that interpolate is checked before evaluate; that order makes sense when interpolate is looking for <%= ... %> and evaluate is looking for <% ... %> (i.e. the default delimiters are being used). In your case, you need a bit of extra [^#] trickery to get around the regex checking order.
Similarly, we can simplify your evaluate regex:
/\{\{#(.+?)\}\}/g
Demo: http://jsfiddle.net/ambiguous/V6rv2/
I'd also recommend that you add an escape pattern for completeness.

Related

Regular Expression to exclude a String around the required String

In between a HTML code:
...<div class="..."><a class="..." href="...">I need this String only</a></div>...
How do I write Regular Expression (for Rainmeter which uses Perl RegEx) such that:
-required string "I need this String only" is grouped to be extracted,
-the HTML link tag <a>...</a> might be
absent or present & can be present in between the required string and multiple times as well.
My attempt:
(?siU) <div class="...">.*[>]{0,1}(.*)[</a>]{0,1}</div>
where:
.*= captures every characters except newline{<a class ... "}
[>]{0,1}= accepts 0 or 1 times presence of > {upto >}
(.*)= captures my String
[</a>]{0,1}= accepts 0 or 1 times presence of </a>
this, of course, doesn't work as I want,
This gives output with HTML linking preceding my string
so my question is
How to write a better(and working) RegEx?
Even though I agree with the advice to use a real parser for this problem, this regular expression should solve your problem:
<div [^.<>]|*>(?:[^<>]*<a [^<>]*>)*([^<>]*)(?:</a>)*</div>
Logic:
require <div ...> at the beginning and </div> at the end.
allow and ignore <a ...> before the matched text arbitrarily many times
allow and ignore </a> after the matched text arbitrarily many times
ignore any text before any <a ...> with [^<>]* in front of it. Using .* would also work, but then it would skip all text arbitrarily up to the last instance of <a ...> in your string.
I use [^<>]* instead of .* to match non-tag text in a protected way, since literal < and > are not allowed.
I use (?:...) to group without capturing. If that is not supported in your programming language, just use (...) instead, and adjust which match you use.
Caveat: this won't be fully general but should work for your problem as described.

Recursive regex for templating sub loops

So I looked at How to write a recursive regex that matches nested parentheses? and other solutions for recursive regex matching, but I'm still not getting a proper match on RegexBuddy.
I have a generic handlebars-style template that I want to parse myself, a table with headings:
<table>
<thead>
<tr>
{{#each columns as col }}<th>{{col}}</th>{{/each}}
</tr>
</thead>
<tbody>
{{#each rows as row }}
<tr>
{{#each row as col }}<td>col</td>{{/each}}
</tr>
{{/each}}
</tbody>
</table>
And trying to match with
/{{\#each (\w+) as (\w+) }}(.*?|(?R)){{/each}}/s
The regex matches the {{#each columns... in the <thead> just fine, but it seems to ignore the |(?R) part and matches {{#each rows... only until the first {{/each}}. I, of course, would like it to match both the inner and outer #each expressions. How? This is perhaps much more complex than simple nested parentheses.
(I always feel like I'm a pro at RegEx until I run into things like this. I have been trying for a while to make this work, and regular-expressions.info is just confusing me more.)
I'm currently working around this by doing {{#each_sub...}}...{{/each_sub}} so my regex won't stop on the first closing tag, but that's obviously a sub-optimal way of doing it. I have several other applications that would benefit from recursive regex but can't figure out what I'm doing wrong.
It isn't ignoring the recursion, it's just never reaching it. Because .*? is capable of matching your delimiters ({{#each...}} and {{/each}}), it matches the first closing delimiter it finds and reports success without ever needing to recurse.
For this technique to work, the branch before the (?R) has to match anything that's not a delimiter. Since your delimiters consist of multiple characters, you can't use a negated character class, as they did in the question you linked to. Instead, you need to use a tempered greedy token:
(?:(?!{{[#/]each\b).)*
This is the same as .*, except before it consumes each character it checks to make sure it's not the beginning of {{#each or {{/each. Here it is in context:
{{\#each (\w+) as (\w+) }}(?:(?:(?!{{[#/]each\b).)*|(?R))*{{/each}}
If the first branch fails, it means you've encountered something that looks like a delimiter. If it's an opening delimiter, the second branch takes over and tries to match the whole pattern recursively. Otherwise, it pops out of the loop (note the * after the group--you were missing that, too) and tries to match a closing delimiter.
While the regex above will work fine on valid input, it's subject to catastrophic backtracking if input is malformed. To avoid that, you can use an unrolled loop in place of the alternation (as #Wiktor did in his comment):
{{\#each\s+(\w+)\s+as\s+(\w+)\s*}}(?:(?!{{[#/]each\b).)*(?:(?R)(?:(?!{{[#/]each\b).)*)*{{/each}}
Here's a slightly more readable version, with possessive quantifiers added to squeeze out even more speed:
{{\#each\s+(\w+)\s+as\s+(\w+)\s*}}
(?:(?!{{[#/]each\b).)*+
(?:
(?R)
(?:(?!{{[#/]each\b).)*+
)*+
{{/each}}

Regex, I just don't get it right

I just don't get my Regex right:
I have the following template:
<!-- Defines the template for the tabs. -->
{{TMPL:Import=../../../../Data/Templates/Ribbon/tabs.tmpl; Name=Tabs}}
<div class="tabs">
<ul role="tablist">
{{BOS:Sequence}}
<li role="tab" class="{{TabType}}" id="{{tabId}}">
<span>{{TabFile}}</span>
</li>
{{EOS:Sequence}}
</ul>
</div>
{{Render:Tabs}}
I would like to find everything between {{}} except the tags that begins with {{BOS, {{EOS, {{TMPL, {{Render
Here are a couple approaches:
Attempt 1:
({{).*(}})
This selects everything between {{ }} tags, which is not good.
Attempt 2:
({{)[^TMPL][^BOS][^EOS][^Render].*(}})
This will make that {{TabType}} and {{TabFile}} are not selected anymore and I just don't know why.
With some other regex, I get that {{TabType}}" id="{{tabId}} is selected as one match.
Does anyone have a clue on how to solve this, I really need a regex Guru :-)
You can use negative lookahead based regex like this:
{{(?!TMPL|[BE]OS|Render).*?}}
RegEx Demo
You have to use the following regex to get the content between braces:
\{\{(.*?)\}\}
Working Demo
If you want to exclude the content from the comment you posted you can use a regex technique to exclude what you don't want and keep what you want at the end of the regex:
\{\{BOS:Sequence\}\}|\{\{EOS:Sequence\}\}|\{\{TMPL:Import.*?\}\}|\{\{Render:Tabs\}\}|\{\{(.*?)\}\}
Working demo
By the way, if you want to have a shortcut for above regex you can use:
\{\{(?:BOS|EOS):Sequence\}\}|\{\{TMPL:Import.*?\}\}|\{\{Render:Tabs\}\}|\{\{(.*?)\}\}
This is a very useful technique for pattern exclusion that I glad to learn it from Anubhava and zx81 (they rock using regex pattern). For this regex technique you can find the content you need using capturing groups (check the green highlights on the screenshot below):
Using [^TMPL] and the like won't work because these are character classes. You could use a negative lookahead, though (or even lookbehind depending upon the regex library you are using).
\{\{(?!BOS:)(?!EOS:)(?!Render:)(?!TMPL:)(.*?)\}\}
Still I get the feeling that you want the BOS, EOS, etc. to just be strings in the template with {{ and other values to be interpolated. If you are using handlebars or something, you can have strings interpolated:
{{'{{BOS:Sequence}}'}}

Backbone Template Options

I'm reading about backbone.js and one of the issues I've encountered is the templating system.
My issue is, the example I've seen use the notation of <% %> in the templates.
Unfortunately, this syntax is also used by Mason-Perl which is what we're using on the backend so this collides. Is there any way to define the syntax OR are there any other template options that do not use <% %>?
Thanks.
By default, Backbone uses Underscore's template function. You can modify Underscore's template settings to use symbols other than <% %>:
If ERB-style delimiters aren't your cup of tea, you can change
Underscore's template settings to use different symbols to set off
interpolated code. Define an interpolate regex to match expressions
that should be interpolated verbatim, an escape regex to match
expressions that should be inserted after being HTML escaped, and an
evaluate regex to match expressions that should be evaluated without
insertion into the resulting string. You may define or omit any
combination of the three. For example, to perform Mustache.js style
templating:
_.templateSettings = {
interpolate: /\{\{(.+?)\}\}/g
};
var template = _.template("Hello {{ name }}!");
template({name: "Mustache"});
=> "Hello Mustache!"

Regular expression lookbehind problem

I use
(?<!value=\")##(.*)##
to match string like ##MyString## that's not in the form of:
<input type="text" value="##MyString##">
This works for the above form, but not for this: (It still matches, should not match)
<input type="text" value="Here is my ##MyString## coming..">
I tried:
(?<!value=\").*##(.*)##
with no luck. Any suggestions will be deeply appreciated.
Edit: I am using PHP preg_match() function
This is not perfect (that's what HTML parsers are for), but it will work for the vast majority of HTML files:
(^|>)[^<>]*##[^#]*##[^<>]*(<|$)
The idea is simple. You're looking for a string that is outside of tags. To be outside of tags, the closest preceding angled bracket to it must be closing (or there's no bracket at all), and the closest following one must be opening (or none). This assumes that angled brackets are not used in attribute values.
If you actually care that the attribute name be "value", then you can match for:
value\s*=\s*"([^\"]|\\\")*##[^#]*##([^\"]|\\\")*\"
... and then simply negate the match (!preg_match(...)).
#OP, you can do it simply without regex.
$text = '<input type="text" value=" ##MyString##">';
$text = str_replace(" ","",$text);
if (strpos($text,'value="##' ) !==FALSE ){
$s = explode('value="##',$text);
$t = explode("##",$s[1]);
print "$t[0]\n";
}
here is a starting point at least, it works for the given examples.
(?<!<[^>]*value="[^>"]*)##(.*)##