Problem with Actionscript Regular Expressions - regex

I have to parse out color information from HTML data. The colors can either be RGB colors or file names to a swatch image.
I used http://www.gskinner.com/RegExr/ to develop and test the patterns. I copied the AS regular expression code verbatim from the tool into Flex Builder. But, when I exec the pattern against the string I get a null.
Here are the patterns and an example of the string (I took the correct HTML tags out so the strings would show correctly):
DIV data:
<div style="background-color:rgb(2,2,2);width:10px;height:10px;">
DIV pattern:
/([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/
IMG data:
<img src="/media/swatches/jerzeesbirch.gif" width="10" height="10" alt="Birch">
IMG pattern:
/[a-z0-9_-]+/[a-z0-9_-]+/[a-z0-9_-]+\.[a-z0-9_-]+/
Here's my Actionscript code:
var divPattern : RegExp = new RegExp("/([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/");
var imgPattern : RegExp = new RegExp("/[a-z0-9_-]+/[a-z0-9_-]+/[a-z0-9_-]+\.[a-z0-9_-]+/");
var divResult : Array = divPattern.exec(object.swatch);
var imgResult : Array = imgPattern.exec(object.swatch);
Both of the arrays are null.
This is my first foray into AS coding, so I think I'm declaring something wrong.
Steve

(I don't know ActionScript but I know Javascript and they should be close enough to solve your problem.)
To construct a RegExp object for e.g. the pattern ^[a-z]+$, you either use
var pattern : RegExp = new RegExp("^[a-z]+$");
or, better,
var pattern : RegExp = /^[a-z]+$/
The code new RegExp("/^[a-z]+$/") is wrong because this expects a slash before the ^ and after the $.
Therefore, your DIV pattern should be written as
var divPattern : RegExp = /([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/;
but, as you know, the ( and ) are special characters for capturing, you need to escape them:
var divPattern : RegExp = /\([0-9]{1,3},[0-9]{1,3},[0-9]{1,3}\)/;
For the IMG pattern, as / delimitates a RegEx, you need to escape it as well:
var imgPattern : RegExp = /[a-z0-9_-]+\/[a-z0-9_-]+\/[a-z0-9_-]+\.[a-z0-9_-]+/
Finally, you could use \d in place of [0-9] and \w in place of [a-zA-Z0-9_].

I don't know enough to tell if your regex patterns are correct, but from the docs on the AS3 RegExp class, it looks like your new RegExp() call needs a second argument to declare flags for case sensitivity etc.
EDIT: Also, as Bart K has pointed out, you don't need the / delimiters when using the new method.
So you can use either:
var divPattern:RegExp = new RegExp("([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})", "");
OR you can also use the alternate syntax with /:
var divPattern:RegExp = /([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/;
... in which case the flag string (if any) is included after the final /

Related

Pattern Validator in Angular Reactive Forms using Regex [duplicate]

I'm doing a small javascript method, which receive a list of point, and I've to read those points to create a Polygon in a google map.
I receive those point on the form:
(lat, long), (lat, long),(lat, long)
So I've done the following regex:
\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)
I've tested it with RegexPal and the exact data I receive:
(25.774252, -80.190262),(18.466465, -66.118292),(32.321384, -64.75737),(25.774252, -80.190262)
and it works, so why when I've this code in my javascript, I receive null in the result?
var polygons="(25.774252, -80.190262),(18.466465, -66.118292),(32.321384, -64.75737),(25.774252, -80.190262)";
var reg = new RegExp("/\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g");
var result = polygons.match(reg);
I've no javascript error when executing(with debug mode of google chrome). This code is hosted in a javascript function which is in a included JS file. This method is called in the OnLoad method.
I've searched a lot, but I can't find why this isn't working. Thank you very much!
Use a regex literal [MDN]:
var reg = /\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g;
You are making two errors when you use RegExp [MDN]:
The "delimiters" / are should not be part of the expression
If you define an expression as string, you have to escape the backslash, because it is the escape character in strings
Furthermore, modifiers are passed as second argument to the function.
So if you wanted to use RegExp (which you don't have to in this case), the equivalent would be:
var reg = new RegExp("\\(\\s*([0-9.-]+)\\s*,\\s([0-9.-]+)\\s*\\)", "g");
(and I think now you see why regex literals are more convenient)
I always find it helpful to copy and past a RegExp expression in the console and see its output. Taking your original expression, we get:
/(s*([0-9.-]+)s*,s([0-9.-]+)s*)/g
which means that the expressions tries to match /, s and g literally and the parens () are still treated as special characters.
Update: .match() returns an array:
["(25.774252, -80.190262)", "(18.466465, -66.118292)", ... ]
which does not seem to be very useful.
You have to use .exec() [MDN] to extract the numbers:
["(25.774252, -80.190262)", "25.774252", "-80.190262"]
This has to be called repeatedly until the whole strings was processed.
Example:
var reg = /\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g;
var result, points = [];
while((result = reg.exec(polygons)) !== null) {
points.push([+result[1], +result[2]]);
}
This creates an array of arrays and the unary plus (+) will convert the strings into numbers:
[
[25.774252, -80.190262],
[18.466465, -66.118292],
...
]
Of course if you want the values as strings and not as numbers, you can just omit the +.

Limit regex scope to match items only in a section of code

This is a general regex question.
Suppose I have the following code:
function (item1, item2, item3) {
var item4 = null
var item5 = null
}
I know that's javascript, but I don't want a javascript-specific answer: I'm curious about a pure regex answer.
Suppose I want to write regex that matches any word that starts with "item", but I only want matches that are between parenthesis.
So my question: is there a way to write a regex query that matches everything that starts with item but is also between parens? Like a way to limit my regex scope to just things within parens?
UPDATE: Just so people know, I am asking because I am working on language support for Atom (the text editor), which (to my knowledge) supports only pure regex to match patterns to add language styling. Because of this, I'm stuck with pure regex, even though I am parsing JS.
Pull out parenthesized things, then look inside them.
str="function (item1, item2, item3) {\
var item4 = null\
var item5 = null\
}";
var results = [].concat(
str.match(/\(.*?\)/g)
.map(function(submatch) {
return submatch.match(/\bitem\w*\b/g);
})
);
document.write(results);
This code first calls String#match to retrieve all the parenthesized portions of the input, so in this case it returns ['(item, item2, item3)'] (an array with one element). Then it calls Array#map on that array, transforming each element into a list of matches for item* (the \b matches word boundaries, and \w matches alphanumerics). Since the result is now a nested array ([['item1', 'item2', item3']]), we use Array#concat to flatten it out.
You could try the below regex,
(?:[^()]*(?=\()|[^()]*$)(*SKIP)(*F)|item\w+
DEMO

Using a Variable in an AS3, Regexp

Using Actionscript 3.0 (Within Flash CS5)
A standard regex to match any digit is:
var myRegexPattern:Regex = /\d/g;
What would the regex look like to incorporate a string variable to match?
(this example is an 'IDEAL' not a 'WORKING' snippet) ie:
var myString:String = "MatchThisText"
var myRegexPatter_WithString:Regex = /\d[myString]/g;
I've seen some workarounds which involve creating multiple regex instances, then combine them by source, with the variable in question, which seems wrong. OR using the flash string to regex creator, but it's just plain sloppy with all the double and triple escape sequences required.
There must be some pain free way that I can't find in the live docs or on google. Does AS3 hold this functionality even? If not, it really should.
Or I am missing a much easier means of simply avoiding this task that I'm simply naive too due to my newness to regex?
I've actually blogged about this, so I'll just point you there: http://tyleregeto.com/using-vars-in-regular-expressions-as3 It talks about the possible solutions, but there is no ideal one like you mention.
EDIT
Here is a copy of the important parts of that blog entry:
Here is a regex to strip the tags from a block of text.
/<("[^"]*"|'[^']*'|[^'">])*>/ig
This nifty expression works like a charm. But I wanted to update it so the developer could limit which tags it stripped to those specified in a array. Pretty straight forward stuff, to use a variable value in a regex you first need to build it as a string and then convert it. Something like the following:
var exp:String = 'start-exp' + someVar + 'more-exp';
var regex:Regexp = new RegExp(exp);
Pretty straight forward. So when approaching this small upgrade, that's what I did. Of course one big problem was pretty clear.
var exp:String = '/<' + tag + '("[^"]*"|'[^']*'|[^'">])*>/';
Guess what, invalid string! Better escape those quotes in the string. Whoops, that will break the regex! I was stumped. So I opened up the language reference to see what I could find. The "source" parameter, (which I've never used before,) caught my eye. It returns a String described as "the pattern portion of the regular expression." It did the trick perfectly. Here is the solution:
var start:Regexp = /])*>/ig;
var complete:RegExp = new RegExp(start.source + tag + end.source);
You can reduce it down to this for convenience:
var complete:RegExp = new RegExp(/])*>/.source + tag, 'ig');
As Tyler correctly points out (and his answer works just fine), you can assemble your regex as a string end then pass this string to the RegExp constructor with the new RegExp("pattern", "flags") syntax.
function assembleRegex(myString) {
var re = new RegExp('\\d' + myString, "i");
return re;
}
Note that when using a string to store a regex pattern, you do need to add some extra backslashes to get it to work right (e.g. to get a \d in the regex, you need to specify \\d in the string). Note also that the string pattern does not use the forward slash delimiters. In other words, the following two statements are equivalent:
var re1 = /\d/ig;
var re2 = new Regexp("\\d", "ig");
Additional note: You may need to process the myString variable to escape any backslashes it might contain (if they are to be interpreted as literal). If this is the case the function becomes:
function assembleRegex(myString) {
myString = myString.replace(/\\/, '\\\\');
var re = new RegExp('\\d' + myString);
return re;
}

GWT - 2.1 RegEx class to parse freetext

I'm struggling with the com.google.gwt.regexp.shared.RegExpclass and simply want to parse the phone numbers from a string and get ALL occurrences of a number but only seems to be able to get the 1st occurrences.. I know there is subtle difference in the regex between java (where it works) and GWT.
String freeText = "Theo Powell<5643321309>, Robert Roberts<9653768972>, Betty Wilson<6268281885>, Brandon Anderson<703203115>";
MatchResult matchResult = RegExp.compile("[\+]?[0-9." "-]{8,}").exec(freeText);
int groupCount = matchResult.getGroupCount(); // result = 1
String s = matchResult.getGroup(0); //result = 5643321309
Thanks in advance.
Ian..
You'll have to loop, applying the pattern again until it returns nothing. For that, you first have to use the "global" flag:
ArrayList<String> matches = new ArrayList<String>();
RegExp pattern = RegExp.compile("[\+]?[0-9. -]{8,}", "g");
for (MatchResult result = pattern.exec(freeText); result != null; result = pattern.exec(freeText)) {
matches.add(result.getGroup(0));
}
If you think it's a bit "magic" or "kludgy" (which it kind of is), I'd suggest reading docs about the JavaScript RegExp object, as the RegExp class in GWT is a direct mapping of this: https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/RegExp/exec (with sample code in JS very similar to the one above).
Change the regex from
[\+]?[0-9." "-]{8,}
to
([\+]?[0-9." "-]{8,})
See Capturing Groups for further details.

RegEx for a price in £

i have: \£\d+\.\d\d
should find: £6.95 £16.95 etc
+ is one or more
\. is the dot
\d is for a digit
am i wrong? :(
JavaScript for Greasemonkey
// ==UserScript==
// #name CurConvertor
// #namespace CurConvertor
// #description noam smadja
// #include http://www.zavvi.com/*
// ==/UserScript==
textNodes = document.evaluate(
"//text()",
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
var searchRE = /\£[0-9]\+.[0-9][0-9];
var replace = 'pling';
for (var i=0;i<textNodes.snapshotLength;i++) {
var node = textNodes.snapshotItem(i);
node.data = node.data.replace(searchRE, replace);
}
when i change the regex to /Free for example it finds and changes. but i guess i am missing something!
Had this written up for your last question just before it was deleted.
Here are the problems you're having with your GM script.
You're checking absolutely every
text node on the page for some
reason. This isn't causing it to
break but it's unnecessary and slow.
It would be better to look for text
nodes inside .price nodes and .rrp
.strike nodes instead.
When creating new regexp objects in
this way, backslashes must be
escaped, ex:
var searchRE = new
RegExp('\\d\\d','gi');
not
var
searchRE = new RegExp('\d\d','gi');
So you can add the backslashes, or
create your regex like this:
var
searchRE = /\d\d/gi;
Your actual regular expression is
only checking for numbers like
##ANYCHARACTER##, and will ignore £5.00 and £128.24
Your replacement needs to be either
a string or a callback function, not
a regular expression object.
Putting it all together
textNodes = document.evaluate(
"//p[contains(#class,'price')]/text() | //p[contains(#class,'rrp')]/span[contains(#class,'strike')]/text()",
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
var searchRE = /£(\d+\.\d\d)/gi;
var replace = function(str,p1){return "₪" + ( (p1*5.67).toFixed(2) );}
for (var i=0,l=textNodes.snapshotLength;i<l;i++) {
var node = textNodes.snapshotItem(i);
node.data = node.data.replace(searchRE, replace);
}
Changes:
Xpath now includes only p.price and p.rrp span.strke nodes
Search regular expression created with /regex/ instead of new RegExp
Search variable now includes target currency symbol
Replace variable is now a function that replaces the currency symbol with a new symbol, and multiplies the first matched substring with substring * 5.67
for loop sets a variable to the snapshot length at the beginning of the loop, instead of checking textNodes.snapshotLength at the beginning of every loop.
Hope that helps!
[edit]Some of these points don't apply, as the original question changed a few times, but the final script is relevant, and the points may still be of interest to you for why your script was failing originally.
You are not wrong, but there are a few things to watch out for:
The £ sign is not a standard ASCII character so you may have encoding issue, or you may need to enable a unicode option on your regular expression.
The use of \d is not supported in all regular expression engines. [0-9] or [[:digit:]] are other possibilities.
To get a better answer, say which language you are using, and preferably also post your source code.
£[0-9]+(,[0-9]{3})*\.[0-9]{2}$
this will match anything from £dd.dd to £d[dd]*,ddd.dd. So it can fetch millions and hundreds as well.
The above regexp is not strict in terms of syntaxes. You can have, for example: 1123213123.23
Now, if you want an even strict regexp, and you're 100% sure that the prices will follow the comma and period syntaxes accordingly, then use
£[0-9]{1,3}(,[0-9]{3})*\.[0-9]{2}$
Try your regexps here to see what works for you and what not http://tools.netshiftmedia.com/regexlibrary/
It depends on what flavour of regex you are using - what is the programming language?
some older versions of regex require the + to be escaped - sed and vi for example.
Also some older versions of regex do not recognise \d as matching a digit.
Most modern regex follow the perl syntax and £\d+\.\d\d should do the trick, but it does also depend on how the £ is encoded - if the string you are matching encodes it differently from the regex then it will not match.
Here is an example in Python - the £ character is represented differently in a regular string and a unicode string (prefixed with a u):
>>> "£"
'\xc2\xa3'
>>> u"£"
u'\xa3'
>>> import re
>>> print re.match("£", u"£")
None
>>> print re.match(u"£", "£")
None
>>> print re.match(u"£", u"£")
<_sre.SRE_Match object at 0x7ef34de8>
>>> print re.match("£", "£")
<_sre.SRE_Match object at 0x7ef34e90>
>>>
£ isn't an ascii character, so you need to work out encodings. Depending on the language, you will either need to escape the byte(s) of £ in the regex, or convert all the strings into Unicode before applying the regex.
In Ruby you could just write the following
/£\d+.\d{2}/
Using the braces to specify number of digits after the point makes it slightly clearer