GWT - 2.1 RegEx class to parse freetext - regex

I'm struggling with the com.google.gwt.regexp.shared.RegExpclass and simply want to parse the phone numbers from a string and get ALL occurrences of a number but only seems to be able to get the 1st occurrences.. I know there is subtle difference in the regex between java (where it works) and GWT.
String freeText = "Theo Powell<5643321309>, Robert Roberts<9653768972>, Betty Wilson<6268281885>, Brandon Anderson<703203115>";
MatchResult matchResult = RegExp.compile("[\+]?[0-9." "-]{8,}").exec(freeText);
int groupCount = matchResult.getGroupCount(); // result = 1
String s = matchResult.getGroup(0); //result = 5643321309
Thanks in advance.
Ian..

You'll have to loop, applying the pattern again until it returns nothing. For that, you first have to use the "global" flag:
ArrayList<String> matches = new ArrayList<String>();
RegExp pattern = RegExp.compile("[\+]?[0-9. -]{8,}", "g");
for (MatchResult result = pattern.exec(freeText); result != null; result = pattern.exec(freeText)) {
matches.add(result.getGroup(0));
}
If you think it's a bit "magic" or "kludgy" (which it kind of is), I'd suggest reading docs about the JavaScript RegExp object, as the RegExp class in GWT is a direct mapping of this: https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/RegExp/exec (with sample code in JS very similar to the one above).

Change the regex from
[\+]?[0-9." "-]{8,}
to
([\+]?[0-9." "-]{8,})
See Capturing Groups for further details.

Related

regex vs substring

I have a very short xml String passed to my app from another app and I'm only interested in extracting the content between the "level" tags. Which solution is better between these two:
String xmlString =
"<type>
<perm>
<date>99999999</date>
<level>admin</level>
</perm>
</type>";
String level = xmlString.substring(xmlString.indexOf("<level>") + "<level>".length(),
xmlString.indexOf("</level>"));
or
Pattern p1 = Pattern.compile("<level>(\\S+)</level>");
Matcher m = p1.matcher(xmlString);
if (m.find()) {
String level = m.group(1);
}
Have you tried bench-marking this on your own? From what I've read it seems that you generally want to go regex first and if you can't optimize that then try substring. However I'm a little confused why you aren't using something like XmlObject.factory to handling your XML parsing. https://xmlbeans.apache.org/docs/2.0.0/reference/org/apache/xmlbeans/XmlObject.Factory.html

Using a Variable in an AS3, Regexp

Using Actionscript 3.0 (Within Flash CS5)
A standard regex to match any digit is:
var myRegexPattern:Regex = /\d/g;
What would the regex look like to incorporate a string variable to match?
(this example is an 'IDEAL' not a 'WORKING' snippet) ie:
var myString:String = "MatchThisText"
var myRegexPatter_WithString:Regex = /\d[myString]/g;
I've seen some workarounds which involve creating multiple regex instances, then combine them by source, with the variable in question, which seems wrong. OR using the flash string to regex creator, but it's just plain sloppy with all the double and triple escape sequences required.
There must be some pain free way that I can't find in the live docs or on google. Does AS3 hold this functionality even? If not, it really should.
Or I am missing a much easier means of simply avoiding this task that I'm simply naive too due to my newness to regex?
I've actually blogged about this, so I'll just point you there: http://tyleregeto.com/using-vars-in-regular-expressions-as3 It talks about the possible solutions, but there is no ideal one like you mention.
EDIT
Here is a copy of the important parts of that blog entry:
Here is a regex to strip the tags from a block of text.
/<("[^"]*"|'[^']*'|[^'">])*>/ig
This nifty expression works like a charm. But I wanted to update it so the developer could limit which tags it stripped to those specified in a array. Pretty straight forward stuff, to use a variable value in a regex you first need to build it as a string and then convert it. Something like the following:
var exp:String = 'start-exp' + someVar + 'more-exp';
var regex:Regexp = new RegExp(exp);
Pretty straight forward. So when approaching this small upgrade, that's what I did. Of course one big problem was pretty clear.
var exp:String = '/<' + tag + '("[^"]*"|'[^']*'|[^'">])*>/';
Guess what, invalid string! Better escape those quotes in the string. Whoops, that will break the regex! I was stumped. So I opened up the language reference to see what I could find. The "source" parameter, (which I've never used before,) caught my eye. It returns a String described as "the pattern portion of the regular expression." It did the trick perfectly. Here is the solution:
var start:Regexp = /])*>/ig;
var complete:RegExp = new RegExp(start.source + tag + end.source);
You can reduce it down to this for convenience:
var complete:RegExp = new RegExp(/])*>/.source + tag, 'ig');
As Tyler correctly points out (and his answer works just fine), you can assemble your regex as a string end then pass this string to the RegExp constructor with the new RegExp("pattern", "flags") syntax.
function assembleRegex(myString) {
var re = new RegExp('\\d' + myString, "i");
return re;
}
Note that when using a string to store a regex pattern, you do need to add some extra backslashes to get it to work right (e.g. to get a \d in the regex, you need to specify \\d in the string). Note also that the string pattern does not use the forward slash delimiters. In other words, the following two statements are equivalent:
var re1 = /\d/ig;
var re2 = new Regexp("\\d", "ig");
Additional note: You may need to process the myString variable to escape any backslashes it might contain (if they are to be interpreted as literal). If this is the case the function becomes:
function assembleRegex(myString) {
myString = myString.replace(/\\/, '\\\\');
var re = new RegExp('\\d' + myString);
return re;
}

RegEx : Replace parts of dynamic strings

I have a string
IsNull(VSK1_DVal.RuntimeSUM,0),
I need to remove IsNull part, so the result would be
VSK1_DVal.RuntimeSUM,
I'm absolute new to RegEx, but it wouldn't be a problem, if not one thing :
VSK1 is dynamic part, can be any combination of A-Z,0-9 and any length. How to replace strings with RegEx? I use MSSQL 2k5, i think it uses general set of RegEx rules.
EDIT : I forgot to say, that I'm doing replacement in SSMS Query window's Replace Box (^H) - not building RegEx query
br
marius
here's a regex that should work:
[^(]+\(([^,]+),[^)]\)
Then use $1 capture group to extract the part that you need.
I did a sanity check in ruby:
orig = "IsNull(VSK1_DVal.RuntimeSUM,0),"
regex = /[^(]*\(([^,]+),[^)]\)/
result = orig.sub(regex){$1} # result => VSK1_DVal.RuntimeSUM,
It gets trickier if you have a prefix that you want to retain. Like if you have this:
"somestuff = IsNull(VSK1_DVal.RuntimeSUM,0),"
In this case, you need someway to identify the start of the pattern. Maybe you can use '=' to identify the start of the pattern? If so, this should work:
orig = "somestuff = IsNull(VSK1_DVal.RuntimeSUM,0),"
regex = /=\s*\w+\(([^,]+),[^)]\)/
result = orig.sub(regex){$1} # result => somestuff = VSK1_DVal.RuntimeSUM,
But then the case where you don't have an equals sign will fail. Maybe you can use 'IsNull' to identify the start of the pattern? If so, try this (note the '/i' representing case insensitive matching):
orig = "somestuff = isnull(VSK1_DVal.RuntimeSUM,0),"
regex = /IsNull\(([^,]+),[^)]\)/i
result = orig.sub(regex){$1} # result => somestuff = VSK1_DVal.RuntimeSUM,
/IsNULL\((A-Z0-9+),0\)/
Then pick group match number 1.
Here's a very useful site: http://www.regexlib.com/RETester.aspx
They have a tester and a cheatsheet that are very useful for quick testing of this sort.
I tested the solution by Dave and it works fine except it also removes the trailing comma you wanted retained. Minor thing to fix.
Try this:
IsNULL\((.*,)0\)
You say in your question
I use MSSQL 2k5, i think it uses
general set of RegEx rules.
This is not true unless you enable CLR and compile and install an assembly. You can use its native pattern matching syntax and LIKE for this as below.
WITH T(C) AS
(
SELECT 'IsNull(VSK1_DVal.RuntimeSUM,0),' UNION ALL
SELECT 'IsNull(VSK1_DVal.RuntimeSUM,123465),' UNION ALL
SELECT 'No Match'
)
SELECT SUBSTRING(C,8,1+LEN(C)-8-CHARINDEX(',',REVERSE(C),2))
FROM T
WHERE C LIKE 'IsNull(%,_%),'

Regex For Finding Ctypes with Int32

(Hey all,
I am looking for a little regex help...
I am trying to find all CType(expression,Int32) s and replace them with CInt(expression)
This, however, is proving quite difficult, considering there could be a nested Ctype(expression, Int32) within the regex match. Does anyone have any ideas for how to best go about doing this?
Here is what I have now:
Dim str As String = "CType((original.Width * CType((targetSize / CType(original.Height, Single)), Single)), Int32)"
Dim exp As New Regex("CType\((.+), Int32\)")
str = exp.Replace(str, "CInt($1)")
But this will match the entire string and replace it.
I was thinking of doing a recursive function to find the outer most match, and then work inwards, but that still presents a problem with things like
CType(replaceChars(I), Int32)), Chr(CType(replacementChars(I), Int32)
Any tips would be appreciated.
Input
returnString.Replace(Chr(CType(replaceChars(I), Int32)), Chr(CType(replacementChars(I), Int32)))
Output:
returnString.Replace(Chr(CInt(replaceChars(I))),Chr(CInt(replacementChars(I))))
Edit:
Been working on it a little more and have a recursive function that I'm still working out the kinks in. Recursion + regex. it kinda hurts.
Private Function FindReplaceCInts(ByVal strAs As String) As String
System.Console.WriteLine(String.Format("Testing : {0}", strAs))
Dim exp As New Regex("CType\((.+), Int32\)")
If exp.Match(strAs).Success Then
For Each match As Match In exp.Matches(strAs)
If exp.Match(match.Value.Substring(2)).Success Then
Dim replaceT As String = match.Value.Substring(2)
Dim Witht As String = FindReplaceCInts(match.Value.Substring(2))
System.Console.WriteLine(strAs.IndexOf(replaceT))
strAs.Replace(replaceT, Witht)
End If
Next
strAs = exp.Replace(strAs, "CInt($1)")
End If
Return strAs
End Function
Cheers,
What do you guys think of this?
I think it does it quite nicely for a variety of cases that I have tested so far...
Private Function FindReplaceCInts(ByVal strAs As String) As String
Dim exp As New Regex("CType\((.+), Int32\)")
If exp.Match(strAs).Success Then
For Each match As Match In exp.Matches(strAs)
If exp.Match(match.Value.Substring(2)).Success Then
Dim replaceT As String = match.Value.Substring(2)
Dim Witht As String = FindReplaceCInts(match.Value.Substring(2))
strAs = strAs.Replace(replaceT, Witht)
End If
Next
strAs = exp.Replace(strAs, "CInt($1)")
End If
Return strAs
End Function
try to use this (?!CType\(.+, )Int32 regex instead of yours
You need to use negative look ahead to accomplish your task.
Check regex at this site
I've tried this in VS 2008 (no copy of VS 2010 to try it out), using the Find & Replace dialog:
Regular Expression: CType\({.+}, Int32\)
Replace With: CInt(\1)
It won't fix the nested situations in one pass, but you should be able to continue searching with that pattern and replacing until no other matches are found.
BTW: That dialog also provides a link to this help page explaining characters used the VS flavor of regex http://msdn.microsoft.com/en-us/library/aa293063(VS.71).aspx

Problem with Actionscript Regular Expressions

I have to parse out color information from HTML data. The colors can either be RGB colors or file names to a swatch image.
I used http://www.gskinner.com/RegExr/ to develop and test the patterns. I copied the AS regular expression code verbatim from the tool into Flex Builder. But, when I exec the pattern against the string I get a null.
Here are the patterns and an example of the string (I took the correct HTML tags out so the strings would show correctly):
DIV data:
<div style="background-color:rgb(2,2,2);width:10px;height:10px;">
DIV pattern:
/([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/
IMG data:
<img src="/media/swatches/jerzeesbirch.gif" width="10" height="10" alt="Birch">
IMG pattern:
/[a-z0-9_-]+/[a-z0-9_-]+/[a-z0-9_-]+\.[a-z0-9_-]+/
Here's my Actionscript code:
var divPattern : RegExp = new RegExp("/([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/");
var imgPattern : RegExp = new RegExp("/[a-z0-9_-]+/[a-z0-9_-]+/[a-z0-9_-]+\.[a-z0-9_-]+/");
var divResult : Array = divPattern.exec(object.swatch);
var imgResult : Array = imgPattern.exec(object.swatch);
Both of the arrays are null.
This is my first foray into AS coding, so I think I'm declaring something wrong.
Steve
(I don't know ActionScript but I know Javascript and they should be close enough to solve your problem.)
To construct a RegExp object for e.g. the pattern ^[a-z]+$, you either use
var pattern : RegExp = new RegExp("^[a-z]+$");
or, better,
var pattern : RegExp = /^[a-z]+$/
The code new RegExp("/^[a-z]+$/") is wrong because this expects a slash before the ^ and after the $.
Therefore, your DIV pattern should be written as
var divPattern : RegExp = /([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/;
but, as you know, the ( and ) are special characters for capturing, you need to escape them:
var divPattern : RegExp = /\([0-9]{1,3},[0-9]{1,3},[0-9]{1,3}\)/;
For the IMG pattern, as / delimitates a RegEx, you need to escape it as well:
var imgPattern : RegExp = /[a-z0-9_-]+\/[a-z0-9_-]+\/[a-z0-9_-]+\.[a-z0-9_-]+/
Finally, you could use \d in place of [0-9] and \w in place of [a-zA-Z0-9_].
I don't know enough to tell if your regex patterns are correct, but from the docs on the AS3 RegExp class, it looks like your new RegExp() call needs a second argument to declare flags for case sensitivity etc.
EDIT: Also, as Bart K has pointed out, you don't need the / delimiters when using the new method.
So you can use either:
var divPattern:RegExp = new RegExp("([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})", "");
OR you can also use the alternate syntax with /:
var divPattern:RegExp = /([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/;
... in which case the flag string (if any) is included after the final /