Regular expression for parsing out CSS rules block - regex

Has snippet of CSS:
.__chat.__32x32 {
color: white;
background: url({ANYNAME:anyName}Images/icons/32x32/chat.png{ANYNAME}) no-repeat;
}
The problem: need to parse out block of css(selector+rules block). Bu closing curly bracket in url considered by my pattern as closing for rules block. All the tries with lookarounds at this point did not give me a success.
Question: How to make pattern consider construction {anyname} as part of the match string if it is inside url: rule or rules block?
var parser = new Regex(#"([a-z0-9\s,*\""\[\]=\.\:#_\-#]+){((?>[^}]+|(?:(?<={ROOT)|(?<={VERSION))})*)}|(\/\*(?:(?:(?!\*\/)[\S\s])+)\*\/)", RegexOptions.Multiline | RegexOptions.IgnoreCase);
MatchCollection matches;
matches = parser.Matches(fullContent);
foreach (Match match in matches)
{
if (match.Value.IndexOf("/*") > -1)
{
var cssComment = new CssComment(match.Value);
_cssElements.Add(cssComment);
}
else
{
var cssBlock = new CssBlock(match.Groups[1].Value.Trim(), match.Groups[2].Value.Trim());
_cssElements.Add(cssBlock);
}
}

Please try this:
[.#]?[\w\s.<>]+{(\n.*)+?(?=\n}\n)\n}
it matches whole block (selector+inside code). You may have to make some alterations in the selector part [.#]?[\w\s.<>] to make it work for your problem.

Related

phrase search in meteor search-source package

I have a meteor app for which I added the search-source package to search certain collections and it works partially. That is, when I search for the term foo bar it returns results for each of "foo" and "bar". This is fine, but I want to also be able to wrap the terms in quotes this way: "foo bar" and get results for an exact match only. at the moment when i do this i get an empty set. Here is my server code:
//Server.js
SearchSource.defineSource('FruitBasket', function(searchText, options) {
// options = options || {}; // to be sure that options is at least an empty object
if(searchText) {
var regExp = buildRegExp(searchText);
var selector = {$or: [
{'fruit.name': regExp},
{'fruit.season': regExp},
{'fruit.treeType': regExp}
]};
return Basket.find(selector, options).fetch();
} else {
return Basket.find({}, options).fetch();
}
});
function buildRegExp(searchText) {
// this is a dumb implementation
var parts = searchText.trim().split(/[ \-\:]+/);
return new RegExp("(" + parts.join('|') + ")", "ig");
}
and my client code:
//Client.js
Template.dispResults.helpers({
getPackages_fruit: function() {
return PackageSearch_fruit.getData({
transform: function(matchText, regExp) {
return matchText.replace(regExp, "<b>$&</b>")
},
sort: {isoScore: -1}
});
}
});
Thanks in advance!
I've modified the .split pattern so that it ignores everything between double quotes.
/[ \-\:]+(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)/
Thus, you can simply wrap an exact phrase search in double quotes and it won't get split.
There is one more thing; since we don't need the quotes, they are removed in the next line using a .map function with a regex that replaces double quotes at the start or the end of a string part: /^"|"$/
Sample code:
function buildRegExp(searchText) {
// exact phrase search in double quotes won't get split
var arr = searchText.trim().split(/[ \-\:]+(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)/);
var parts = arr.map(function(x){return x.replace(/^"|"$/g, '');});
return new RegExp("(" + parts.join('|') + ")", "ig");
}
console.log(buildRegExp("foo bar"));
console.log(buildRegExp("\"foo bar\""));

Glib regex for matching whole word?

For matching a whole word, the regex \bword\b should suffice. Yet the following code always returns 0 matches
try {
string pattern = "\bhtml\b";
Regex wordRegex = new Regex (pattern, RegexCompileFlags.CASELESS, RegexMatchFlags.NOTEMPTY);
MatchInfo matchInfo;
string lineOfText = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">";
wordRegex.match (lineOfText, RegexMatchFlags.NOTEMPTY, out matchInfo);
stdout.printf ("Match count is: %d\n", matchInfo.get_match_count ());
} catch (RegexError regexError) {
stderr.printf ("Regex error: %s\n", regexError.message);
}
This should be working as testing the \bhtml\b pattern returns one match for the provided string in testing engines. But on this program it returns 0 matches. Is the code wrong? What regex in Glib would be used to match a whole word?
It looks like you have to escape the backslash too:
try {
string pattern = "\\bhtml\\b";
Regex wordRegex = new Regex (pattern, RegexCompileFlags.CASELESS, RegexMatchFlags.NOTEMPTY);
MatchInfo matchInfo;
string lineOfText = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">";
wordRegex.match (lineOfText, RegexMatchFlags.NOTEMPTY, out matchInfo);
stdout.printf ("Match count is: %d\n", matchInfo.get_match_count ());
} catch (RegexError regexError) {
stderr.printf ("Regex error: %s\n", regexError.message);
}
Output:
Match count is: 1
Demo
You can simplify your code with regular expression literals:
Regex regex = /\bhtml\b/i;
You don't have to quote backslashes in the regular expression literal syntax. (Front slashes would be problematic though.)
Full example:
void test_match (string text, Regex regex) {
MatchInfo match_info;
if (regex.match (text, RegexMatchFlags.NOTEMPTY, out match_info)) {
stdout.printf ("Match count is: %d\n", match_info.get_match_count ());
}
else {
stdout.printf ("No match");
}
}
int main () {
Regex regex = /\bhtml\b/i;
test_match ("<!DOCTYPE html PUBLIC>", regex);
return 0;
}

Reuse a captured group outside of the regex string

I'm reading from a text file that has many lines containing placeholders like this: "{name_of_placeholder}". There is another file that's like a map - the keys are the names of each placeholder and there's a value for each one. I would like to use regex to find every placeholder in the first file and replace {name_of_placeholder} with the corresponding value from the second file.
The first thing that came to my mind is to capture the group between "{}", but how to use it outside of the string? If that's not possible maybe someone can think of another way to do this?
Thanks in advance!
Although you haven't defined the language, but whatever the language is, you can try the following approach:
var dict={}
const regex1 = /(.*)=(.*)/gm;
// let str1 be the second file (dictionary)
const str1 = `abc1=1
abc2=2
abc3=3
abc4=4
abc5=5
abc6=6
abc7=7
abc8=8
abc9=9
abc10=10
abc11=11
abc12=12`;
let m1;
while ((m1 = regex1.exec(str1)) !== null) {
if (m1.index === regex1.lastIndex) {
regex1.lastIndex++;
}
dict[m1[1]]=m1[2];
}
//console.log(dict);
const regex = /\{(.*?)\}/gm;
// let str be the first file where you want the replace operation on {key...}
var str = `adfas{abc1} asfasdf
asdf {abc3} asdfasdf
asdfas {abc5} asdfasdf
asdfas{abc7} asdfasdfadf
piq asdfj asdf
`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
str=str.replace("\{"+m[1]+"\}",dict[m[1]]);
}
console.log(str);

Regular Expression Matcher

I am using pattern matching to match file extension with my expression String for which code is as follows:-
public static enum FileExtensionPattern
{
WORDDOC_PATTERN( "([^\\s]+(\\.(?i)(txt|docx|doc))$)" ), PDF_PATTERN(
"([^\\s]+(\\.(?i)(pdf))$)" );
private String pattern = null;
FileExtensionPattern( String pattern )
{
this.pattern = pattern;
}
public String getPattern()
{
return pattern;
}
}
pattern = Pattern.compile( FileExtensionPattern.WORDDOC_PATTERN.getPattern() );
matcher = pattern.matcher( fileName );
if ( matcher.matches() )
icon = "blue-document-word.png";
when file name comes as "Home & Artifact.docx" still matcher.matches returns false.It works fine with filename with ".doc" extension.
Can you please point out what i am doing wrong.
"Home & Artifact.docx" contains spaces. Since you allow any char except whitespaces [^\s]+, this filename is not matched.
Try this instead:
(.+?(\.(?i)(txt|docx|doc))$
It is because you have spaces in filename ("Home & Artifact.docx") but your regex has [^\\s]+ which won't allow any spaces.
Use this regex instead for WORDDOC_PATTERN:
"(?i)^.+?\\.(txt|docx|doc)$"

Regular Expression to Extract the Url out of the Anchor Tag

I want to extract the http link from inside the anchor tags? The extension that should be extracted should be WMV files only.
Because HTML's syntactic rules are so loose, it's pretty difficult to do with any reliability (unless, say, you know for absolute certain that all your tags will use double quotes around their attribute values). Here's some fairly general regex-based code for the purpose:
function extract_urls($html) {
$html = preg_replace('<!--.*?-->', '', $html);
preg_match_all('/<a\s+[^>]*href="([^"]+)"[^>]*>/is', $html, $matches);
foreach($matches[1] as $url) {
$url = str_replace('&', '&', trim($url));
if(preg_match('/\.wmv\b/i', $url) && !in_array($url, $urls))
$urls[] = $url;
}
preg_match_all('/<a\s+[^>]*href=\'([^\']+)\'[^>]*>/is', $html, $matches);
foreach($matches[1] as $url) {
$url = str_replace('&', '&', trim($url));
if(preg_match('/\.wmv\b/i', $url) && !in_array($url, $urls))
$urls[] = $url;
}
preg_match_all('/<a\s+[^>]*href=([^"\'][^> ]*)[^>]*>/is', $html, $matches);
foreach($matches[1] as $url) {
$url = str_replace('&', '&', trim($url));
if(preg_match('/\.wmv\b/i', $url) && !in_array($url, $urls))
$urls[] = $url;
}
return $urls;
}
Regex:
<a\\s*href\\s*=\\s*(?:(\"|\')(?<link>[^\"]*.wmv)(\"|\'))\\s*>(?<name>.*)\\s*</a>
[Note: \s* is used in several places to match the extra white space characters that can occur in the html.]
Sample C# code:
/// <summary>
/// Assigns proper values to link and name, if the htmlId matches the pattern
/// Matches only for .wmv files
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetHrefDetailsWMV(string htmlATag, out string wmvLink, out string name)
{
wmvLink = null;
name = null;
string pattern = "<a\\s*href\\s*=\\s*(?:(\"|\')(?<link>[^\"]*.wmv)(\"|\'))\\s*>(?<name>.*)\\s*</a>";
if (Regex.IsMatch(htmlATag, pattern))
{
Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
wmvLink = r.Match(htmlATag).Result("${link}");
name = r.Match(htmlATag).Result("${name}");
return true;
}
else
return false;
}
MyRegEx.TryGetHrefDetailsWMV("<td><a href='/path/to/file'>Name of File</a></td>",
out wmvLink, out name); // No match
MyRegEx.TryGetHrefDetailsWMV("<td><a href='/path/to/file.wmv'>Name of File</a></td>",
out wmvLink, out name); // Match
MyRegEx.TryGetHrefDetailsWMV("<td><a href='/path/to/file.wmv' >Name of File</a></td>", out wmvLink, out name); // Match
I wouldn't do this with regex - I would probably use jQuery:
jQuery('a[href$=.wmv]').attr('href')
Compare this to chaos's simplified regex example, which (as stated) doesn't deal with fussy/complex markup, and you'll hopefully understand why a DOM parser is better than a regex for this type of problem.