dart regex matching and get some information from it - regex

For practice, I decided to build something like a Backbone router. The user only needs to give the regex string like r'^first/second/third/$' and then hook that to a View.
For Example, suppose I have a RegExp like this :
String regexString = r'/api/\w+/\d+/';
RegExp regExp = new RegExp(regexString);
View view = new View(); // a view class i made and suppose that this view is hooked to that url
And a HttRequest point to /api/topic/1/ and that would match that regex, then i can rendered anything hook to that url.
The problem is, from the regex above, how do i know that \w+ and \d+ value is topic and 1.
Care to give me some pointers anyone? Thank you.

You need to put the parts you want to extract into groups so you can extract them from the match. This is achieved by putting a part of the pattern inside parentheses.
// added parentheses around \w+ and \d+ to get separate groups
String regexString = r'/api/(\w+)/(\d+)/'; // not r'/api/\w+/\d+/' !!!
RegExp regExp = new RegExp(regexString);
var matches = regExp.allMatches("/api/topic/3/");
print("${matches.length}"); // => 1 - 1 instance of pattern found in string
var match = matches.elementAt(0); // => extract the first (and only) match
print("${match.group(0)}"); // => /api/topic/3/ - the whole match
print("${match.group(1)}"); // => topic - first matched group
print("${match.group(2)}"); // => 3 - second matched group
however, the given regex would also match "/api/topic/3/ /api/topic/4/" as it is not anchored, and it would have 2 matches (matches.length would be 2) - one for each path, so you might want to use this instead:
String regexString = r'^/api/(\w+)/(\d+)/$';
This ensures that the regex is anchored exactly from beginning to the end of the string, and not just anywhere inside the string.

Related

Flutter cannot parse regex

Flutter cannot parse this working regex, and doesn't return any error or info.
(?<=id=)[^&]+
However, when I add it into my Flutter app:
print("before");
new RegExp(r'(?<=id=)[^&]+');
print("after");
It doesn't do anything, doesn't return any error. The print("after"); never gets executed. It doesn't completly freeze the app, because it's in async.
Dart compiled for the Web supports lokbehinds, but the current version of native Dart (including Flutter) does not support lookbehinds (source).
In your case, you want to match a string after a specific string. All you need is to declare a capturing group in your pattern and then access that submatch:
RegExp regExp = new RegExp(r"id=([^&]+)");
String s = "http://example.com?id=some.thing.com&other=parameter; http://example.com?id=some.thing.com";
Iterable<Match> matches = regExp.allMatches(s);
for (Match match in matches) {
print(match.group(1));
}
Output:
some.thing.com
some.thing.com
Here, id=([^&]+) matches id= and then the ([^&]+) capturing group #1 matches and captures into Group 1 any one or more chars other than &. Note you may make it safer if you add [?&] before id to only match id and not thisid query param: [?&]id=([^&]+).
I assume this is https://github.com/dart-lang/sdk/issues/34935
Bring Dart's RegExp support in line with JavaScript: lookbehinds, property escapes, and named groups.

Perform Regex on value returned by Regex

This is probably straightforward but I'm not even sure which phrase I should google to find the answer. Forgive my noobiness.
I've got strings (filenames) that look like this:
site12345678_date20160912_23001_to_23100_of_25871.txt
What this naming convention means is "Records 23001 through 23100 out of 25871 for site 12345678 for September 12th 2016 (20160912)"
What I want to do is extract the date part (those digits between _date and the following _)
The Regex: .*(_date[0-9]{8}).* will return the string _date20160912. But what I'm actually looking for is just 20160912. Obviously, [0-8]{8} doesn't give me what I want in this case because that could be confused with the site, or potentially record counts
How can I responsibly accomplish this sort of 'substringing' with a single regular expression?
You just need to shift you parentheses so as to change the capture group from including '_date' in it. Then you would want to look for your capture group #1:
If done in python, for example, it would look something like:
import re
regex = '.*_date([0-9]{8}).*'
str = 'site12345678_date20160912_23001_to_23100_of_25871.txt'
m = re.match(regex, str)
print m.group(0) # the whole string
print m.group(1) # the string you are looking for '20160912'
See it in action here: https://eval.in/641446
The Regex: .*(_date[0-9]{8}).* will return the string _date20160912.
That means you are using the regex in a method that requires a full string match, and you can access Group 1 value. The only thing you need to change in the regex is the capturing group placement:
.*_date([0-9]{8}).*
^^^^^^^^^^
See the regex demo.

Javascript Storing Regexp screws the original format

I have an angular app where a user can add a regexp in a form, a value like:
github\.com/([A-Za-z0-9\-\_]+)/([A-Za-z0-9\-\_]+)
When I store this in the localStorage and I inspect the localStorage:
github\\\\.com\\/([A-Za-z0-9\\\\-\\\\_]+)\\/([A-Za-z0-9\\\\-\\\\_]+)
When I retrieve in Javascript elsewhere this value I get:
github\\.com\/([A-Za-z0-9\\-\\_]+)\/([A-Za-z0-9\\-\\_]+)
This is not the original regexp and the match method in Javascript can't work.
NOTE: after submitting the form, I store the object with:
localStorage.myobject = JSON.stringify(myobject);
You can get rid of overescaping here, just use
github[.]com/([A-Za-z0-9_-]+)/([A-Za-z0-9_-]+)
and initialize it via a RegExp constructor so as not to have to escape the regex delimiter /. A dot inside [] loses its special meaning and only matches a literal dot, the hyphen at the end of the character class only matches a literal hyphen, and the _ does not have to be escaped at all anywhere in the pattern:
var tst = "github.com/Test1-Text/Test2";
var pattern = "github[.]com/([A-Za-z0-9_-]+)/([A-Za-z0-9_-]+)";
console.log(new RegExp(pattern).test(tst));
UPDATE:
When using patterns from external sources, you need to use the constructor notation. Make sure your regex patterns are stored as literal strings (if you had RegExp("C:\\\\Folder"), make sure it is stored as C:\\Folder), and when reading the value in it will be automatically be usable with the RegExp constructor.

Actionscript: check if string contains domain or subdomain

I need to know if a string contains a specific domain
I have an array like this
private var validDomain:Array = new Array(
"http://*.site1.com",
"http://*.site2.com",
"http://*.site3.com",
);
private var isValidDomain:Boolean = false;
private var URL:string = "http://mysub.site2.com";
now i would check if my string is valid, so i think something like that:
for each (var domain_ in validDomain){
if(SOMEREGEX){
isValidDomain=true;
}
}
What i put in SOMEREGEX?!
The problem lies in the fact that you use two different logics.
The first one is a wildcard-based string, with wildcards like *; you must translate this wildcard based pattern in a regular expression pattern.
To accomplish this, a quick and dirty solution would be to do some string replacements like:
substituting the * wildcard in the pattern with .*
escaping the characters in the wildcard based pattern that are "special" to regular expressions (e.g. substituting . with \.
With this logic in mind, you will transform a wildcard based pattern like:
http://*.mydomain.com
into a regular expression pattern:
http:\/\/.*\.mydomain\.com
which you can use to test your string.
edit: .* is a very crude way to test for a subdomain, to do things neatly you should use a correct pattern like the ones in this thread: Regexp for subdomain

Backbone.js route using regex - Matching a URL that does not end with a given string

I have to create a route using regex that matches a URL which does not end with a particular word say 'submit'. For example -
/login/submit ==> does not match
/login/abcsubmit ==> does not match
/abc/xyx => Matches
Use this regex:
((?!(.*?)/\w*submit).*)
like explained in http://backbonejs.org/#Router-route
this.route(/^((?!(.*?)/\w*submit).*)$/, "functionName");
I had tried #Nestenius regex that he provided and it was still matching the first two example urls that you had provided. The reason it was is because the regex was not anchored to the start of the string.
You could still use his regex if you add an ^ tag to the beginning of the regex like so:
^((?!(.*?)/\w*submit).*)
Or you can use this shorter version:
^(?!.*submit).*
Both will match any string that does not contain "submit" in it.