Can a sizzle selector evaluate a regular expression? - regex

I need to select links with a specific format of URLs. Can I use sizzle to evaluate a link's href attribute against a regular expression?
For example, can I do something like this:
var arrayOfLinks = Sizzle('a[HREF=[0-9]+$]');
to create an array of all links on the page whose URL ends in a number?

Give this a try. I've attempted to convert the jQuery regex selector that Kobi linked to into a Sizzle selector extension. Seems to work, but I haven't put it through a lot of testing.
Sizzle.selectors.filters.regex = function(elem, i, match){
var matchParams = match[3].split(',', 2);
var attr = matchParams[0];
var pattern = matchParams[1];
var regex = new RegExp(pattern.replace(/^\s+|\s+$/g,''), 'ig');
return regex.test(elem.getAttribute(attr));
};
In this case, your example would be written as:
var arrayOfLinks = Sizzle('a:regex(href,[0-9]+$)');

Related

Multiple regex to match file extension with version

My current regex is like so
/\.(jpe?g|png|gif|svg)$/i
I'm trying to modify it to support matching when the extension has get parameters at the end of it so all of the below formats would match
../fonts/fontawesome-webfont.svg
../fonts/fontawesome-webfont.svg?v=4.3.0
../fonts/fontawesome-webfont.svg?v=4.3.0#fontawesomeregular'
How can I modify it to support these?
Assuming the URLs to be parsed follow proper formatting (where only one '?' delimiter can be used to signify the start of the query) you could do:
/\.(jpe?g|png|gif|svg)(?:\?.*|)$/i
var urls = [
'../fonts/fontawesome-webfont.svg',
'../fonts/fontawesome-webfont.svg?v=4.3.0',
'../fonts/fontawesome-webfont.svg?v=4.3.0#fontawesomeregular'
];
var matches = urls.map(function(url) { return url.match(/\.(jpe?g|png|gif|svg)(?:\?.*|)$/i); });
document.write('<pre>' + JSON.stringify(matches, null, 2) + '</pre>');
Alternatively you could use Node's url.parse():
var url = require('url');
var urlObj = url.parse(URL_STRING);
var matches = urlObj.pathname.match(/\.(jpe?g|png|gif|svg)$/i);

Using Regex on Express with MongoDB

I want to use regex to find records that match certain pattern on Express.js using MongoDB.
Here is my code
var urlList = [];
var db = req.db;
var collection = db.get('project');
var regex = '/^'+url+'/';
console.log('regex = '+regex);
collection.find({url: { $regex: regex, $options: 'i' }},function(err,list){
console.log('length = '+list.length);
var arrayUrl = [];
for (var i=0;i<list.length;i++){
console.log(list[i].url);
arrayUrl.push(list[i].url);
}
});
But I got list.length = 0 although the database contains the records that match the pattern for sure.
Using the following command on cmd
db.project.find({url:{$regex:/^Projek-1/,$options: 'i'}});
I got the results I want.
How to use regex on express.js to find matched records in MongoDB database?
First, you don't really need to use $regex, you'll do fine with url: /foobar/i.
Anyway, the problem is that you are not creating a proper RegExp object, only a string that looks like one. Use a proper one by creating a new instance of RegExp
Example:
var re = new RegExp("^" + url);
...
find({url: re})

Regex to replace domain within links that are not images

Need to replace a domain name on all the links on the page that are not images or pdf files.
This would be a full html page received through a proxy service.
Example:
test<img src="http://www.test.com" /><a href="http://www.test.com/test.pdf">pdf
test1
Result:
test<img src="http://www.test.com" /><a href="http://www.test.com/test.pdf">pdf
test1
If you are using .NET, I strongly suggest you to use HTML Agility Pack
Direct parsing using regex can be very error prone. This questions is also similar to the post below.
What regex should I use to remove links from HTML code in C#?
If the domain is http://www.example.com, the following should do the trick:
/http:\/\/www\.example\.com\S*(?!pdf|jpg|png|gif)\s/
This uses a negative lookahead to ensure that the regex matches a string only if the string does not contain pdf,png,jpg or gif at the specified position.
If none of your pdf urls have query parameters (like a.pdf?asd=12), the following code will work. It replaces only absolute and root-relative urls.
var links = document.getElementsByTagName("a");
var len = links.length;
var newDomain = "http://mydomain.com";
/**
* Match absolute urls (starting with http)
* and root relative urls (starting with a `/`)
* Does not match relative urls like "subfolder/anotherpage.html"
* */
var regex = new RegExp("^(?:https?://[^/]+)?(/.*)$", "i");
//uncomment next line if you want to replace only absolute urls
//regex = new RegExp("^https?://[^/]+(/.*)$", "i");
for(var i = 0; i < len; i++)
{
var link = links.item(i);
var href = link.getAttribute("href");
if(!href) //in case of named anchors
continue;
if(href.match(/\.pdf$/i)) //if pdf
continue;
href = href.replace(regex, newDomain + "$1");
link.setAttribute("href", href);
}

javascript regex multiple occurrence

i have javascript variable having value
var ull="<ul><li>sunil here from mandya </li><li>kumar here</li></ul>"
i want output of alert message of each list content like below
1st alert message =sunil here from mandya
2nd alert message =kumar here
how to accomplish this with regex,please help,, i am new to this
It's not recommended to use a regex on HTML. An XML parser would be better. And even better than that would be using javascript. This will output what you're looking for
var li = document.getElementsByTagName('li');
var liLength = li.length;
for(i=0;i<liLength;i++){
if(document.all){ //IE
alert(li[i].innerText);
}else{ //Firefox
alert(li[i].textContent);
}
}
An alternative, which would be better supported than writing these things yourself would be to use a javascript framework, such as jQuery (like richardtallent suggests).
You'd be able to do something with that in a much less verbose manner, like:
$('li').each(function(){
alert($(this).text());
});
Don't use regex to parse html, use a parser instead
var p = new DOMParser();
var ul = p.parseFromString("<ul><li>sunil here from mandya </li>" +
"<li>kumar here</li></ul>", "text/xml");
var lis = ul.getElementsByTagName("li");
var len = lis.length;
for(var i = 0; i < len; i++)
alert(lis[i].firstChild.nodeValue);
A better bet would be to use JQuery (or just DOM calls) rather than trying to parse HTML using regex.
Also, you should consider not sending users a series of alert() modal dialogs, that would be very annoying to many users.
var str = '<ul><li>sunil here from mandya </li><li>kumar here</li></ul>';
var haystack = str;
var match = haystack.match(/<li>(.+?)<\/li>/);
while (match) {
alert(match[1]);
haystack = haystack.slice(match[0].length);
match = haystack.match(/<li>(.+?)<\/li>/);
}
Of course, if the HTML is actually on the page, it would be much, much better to iterate through the DOM.

How to use regular expression in WatiN

I'm working on WatiN automation tool. I'm having problem in regular expression. I've situation where i have to enter some text and click on a button in the popup window. I'm using AttachToIE method and URL attribute("http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=ef5ad7ef5490-4656-9669-32464aeba7cd") of the popup to attach to the popup.
The problem is each time the popup appears the ID value in the URL changes. So i'm not able to access the popup. can anyone plz help with this by giving me Regular Expression for the changing value of ID in the below URL
("http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=ef5ad7ef5490-4656-9669-32464aeba7cd")
thanking you
It appears that you have a URL with 2 query string parameters Type and ID and your pattern is:
"http://192.168.25.10:215/admin/SelectUsers.aspx?Type=Feedback&ID={some id}"
You can use the Find.ByUrl() attribute constraint method and pass it to AttachToIE() as shown below with the regex for matching that pattern.
string url = "http://192.168.25.10:215/admin/SelectUsers.aspx?Type=Feedback&ID="
Regex regex = new Regex(url + "[a-z0-9]+", RegexOptions.IgnoreCase);
IE ie = IE.AttachToIE(Find.ByUrl(regex));
string baseUrl ="http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID="
Regex urlIE= new Regex(baseUrl + "[\\wd]+", RegexOptions.IgnoreCase);
IE ie = IE.AttachToIE(Find.ByUrl(urlIE);
I'm not familiar with WatiN but it looks like it's runs on .Net so perhaps this might help?
var desiredId = "000000000000-0000-0000-000000000000";
var url = "http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=ef5ad7ef5490-4656-9669-32464aeba7cd&someMoreStuff";
var pattern = #"(?i)(?<=FeedBackId=)[-a-z0-9]+";
var result = Regex.Replace(url, pattern, desiredId);
Console.WriteLine(result);
//Output: http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=000000000000-0000-0000-000000000000&someMoreStuff
The following pattern should have the same affect but is more defensive. It should only match stuff in the query string, it requires the id to be 35 characters and won't match similar parameter names like "PreviousFeedBackId".
var pattern = #"(?i)(?<=\?.*\bFeedBackId=)[-a-z0-9]{35,35}\b";
If you just want to extract the id:
var id = Regex.Match(url, pattern).Value;
Console.WriteLine(id);
//output: ef5ad7ef5490-4656-9669-32464aeba7cd
WatiN has a feature where in we can use the url by neglecting the query string. Below is the code which is working fine for me.
string baseUrl = "http://192.168.25.10:215/admin/SelectUsers.aspx";
IE ie = IE.AttachToIE(Find.ByUrl(baseUrl,true));