I was looking for a way to get all the documents of indexed-field through mongodb text-search.
Though I tried approaches of using regular expressions given below but it's not working. Please suggest.
db.docs.find( { $text: { $search: ".*" } } )
I am using Kimono Labs to scrape a bunch of websites. I'd like to append "/critic-reviews" to the end of a url Kimono allows regex only in this format -
/^()(.*?)()$/
I have a bunch of URLs in this representative format -
http://www.metacritic.com/game/playstation-4/disney-infinity-30-edition
Try to add this function in "Modify results" :
function transform (data) {
function add_url(item) {
item.title.href += "/critic-reviews";
return item;
}
for (var collection in data.results) {
data.results[collection] = data.results[collection].map(add_url);
}
return data;
}
this seems to be one matching pattern?
http://www.metacritic.com/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)
http://regexone.com/lesson/kleene_operators gives you a walkthrough of how this works.
http://www.regextester.com/ and test your regex up there.
Please go through this question of mine:
MongoDB $group and explicit group formation with computed column
But this time, I need to compare strings, not numbers.The CASE query must have a LIKE:
CASE WHEN source LIKE '%Web%' THEN 'Web'
I then need to group by source. How to write this in Mongo? I am trying the following but not sure if $regex is supported inside $cond. By the way, is there a list of valid operators inside $cond somewhere? Looks like $cond isn't very fond of me :)
db.Twitter.aggregate(
{ $project: {
"_id":0,
"Source": {
$cond: [
{ $regex:['$source','/.* Android.*/'] },
'Android',
{ $cond: [
{ $eq: ['$source', 'web'] }, 'Web', 'Others'
] }
]
}
} }
);
There're many other values that I need to write in there, doing a deeper nesting. This is just an example with just 'Android' and 'Web' for the sake of brevity. I have tried both with $eq and $regex. Using $regex gives error of invalid operator whereas using $eq doesn't understand the regex expression and puts everything under 'Others'. If this is possible with regex, kindly let me know how to write it for case-insensitive match.
Thanks for any help :-)
Well, it still seems to be not even scheduled to be implemented :(
https://jira.mongodb.org/browse/SERVER-8892
I'm using 2.6 and took a peek on 3.0, but it's just not there.
There's one workaround though, if you can project your problem onto a stable substring. Then you can $substr the field and use multiple nested $cond. It's awkward, but it works.
Maybe you can try it with MapReduce.
var map = function()
{
var reg1=new RegExp("(Android)+");
var reg2=new RegExp("(web)+");
if (reg1.test(this.source)){
emit(this._id,'Android');
}
else if (reg2.test(this.source))
{
emit(this._id,'web');
}
}
var reduce = function (key,value){
var reduced = {
id:key,
source:value
}
return reduced;
}
db.Twitter.mapReduce(map,reduce,{out:'map_reduce_result'});
db.map_reduce_result.find();
You can use JavaScript regular expresions instead of MongoDB $regex.
My Jquery Regular expression for email validation throwing syntax error.
Error : "Unexpected character \". Below is my code. please anyone give me right solution.
function validateEmail(sEmail) {
var filter = /^([\w-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$/;
if (filter.test(sEmail)) {
return true;
}
else {
return false;
}
}
You have to escape the # sign with two ## alike so :
var filter = /^([\w-\.]+)##((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$/;
try
var filter = /^([-\w\.]+)#...
note the - upfront instead of \w-\.
- in between means range as in [a-z] here with `\w' it does not make sense.
btw whats with \[ and \] after #
Try this:
(?:.*)#(?:.*).(?:.*)
I have a HTML with the following content:
... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...
I would like to parse that and get a match with named groups:
match 1
group["user"]=123
group["section"]=2
match 2
group["user"]=678
group["section"]=5
I can do it if parameters always go in order, first User and then Section, but I don't know how to do it if the order is different.
Thank you!
In my case I had to parse an Url because the utility HttpUtility.ParseQueryString is not available in WP7. So, I created a extension method like this:
public static class UriExtensions
{
private static readonly Regex queryStringRegex;
static UriExtensions()
{
queryStringRegex = new Regex(#"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
}
public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
{
if (uri == null)
throw new ArgumentException("uri");
var matches = queryStringRegex.Matches(uri.OriginalString);
for (int i = 0; i < matches.Count; i++)
{
var match = matches[i];
yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
}
}
}
Then It's matter of using it, for example
var uri = new Uri(HttpUtility.UrlDecode(#"file.aspx?userId=123§ion=2"),UriKind.RelativeOrAbsolute);
var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
var userId = parameters["userId"];
var section = parameters["section"];
NOTE: I'm returning the IEnumerable instead of the dictionary directly just because I'm assuming that there might be duplicated parameter's name. If there are duplicated names, then the dictionary will throw an exception.
Why use regex to split it out?
You could first extrct the query string. Split the result on & and then create a map by splitting the result from that on =
You didn't specify what language you are working in, but this should do the trick in C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
string subjectString = #"... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...";
Regex regexObj =
new Regex(#"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)§ion=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
string user = matchResults.Groups["user"].Value;
string section = matchResults.Groups["section"].Value;
Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
matchResults = matchResults.NextMatch();
}
Console.ReadKey();
}
}
}
Using regex to first find the key value pairs and then doing splits... doesn't seem right.
I'm interested in a complete regex solution.
Anyone?
Check this out
\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>
You can get pairs with something like Groups["key"].Captures[i] & Groups["value"].Captures[i]
Perhaps something like this (I am rusty on regex, and wasn't good at them in the first place anyway. Untested):
/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/
(By the way, the XHTML is malformed; & should be & in the attributes.)
Another approach is to put the capturing groups inside lookaheads:
Regex r = new Regex(#"<a href=""file\.aspx\?" +
#"(?=[^""<>]*?user=(?<user>\w+))" +
#"(?=[^""<>]*?section=(?<section>\w+))";
If there are only two parameters, there's no reason to prefer this way over the alternation-based approaches suggested by Mike and strager. But if you needed to match three parameters, the other regexes would grow to several times their current length, while this one would only need another lookahead like just like the two existing ones.
By the way, contrary to your response to Claus, it matters quite a bit which language you're working in. There's a huge variation in capabilities, syntax, and API from one language to the next.
You did not say which regex flavor you are using. Since your sample URL links to an .aspx file, I'll assume .NET. In .NET, a single regex can have multiple named capturing groups with the same name, and .NET will treat them as if they were one group. Thus you can use the regex
userID=(?<user>\d+)§ion=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)
This simple regex with alternation will be far more efficient than any tricks with lookaround. You can easily expand it if your requirements include matching the parameters only if they're in a link.
a simple python implementation overcoming the ordering problem
In [2]: x = re.compile('(?:(userId|section)=(\d+))+')
In [3]: t = 'href="file.aspx?section=2&userId=123"'
In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]
In [5]: t = 'href="file.aspx?userId=123§ion=2"'
In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]