regex find content question - regex

Trying to use regex refind tag to find the content within the brackets in this example using coldfusion
joe smith <joesmith#domain.com>
The resulting text should be
joesmith#domain.com
Using this
<cfset reg = refind(
"/(?<=\<).*?(?=\>)/s","Joe <joe#domain.com>") />
Not having any luck. Any suggestions?
Maybe a syntax issue, it works in an online regex tester I use.

You can't use lookbehind with CF's regex engine (uses Apache Jakarta ORO).
However, you can use Java's regex though, which does support them, and I've created a wrapper CFC that makes this even easier. Available from:
http://www.hybridchill.com/projects/jre-utils.html
(Update: The wrapper CFC mentioned above has evolved into a full project. See cfregex.net for details.)
Also, the /.../s stuff isn't required/relevant here.
So, from your example, but with improved regex:
<cfset jrex = createObject('component','jre-utils').init()/>
<cfset reg = jrex.match( "(?<=<)[^<>]+(?=>)" , "Joe <joe#domain.com>" ) />
A quick note, since I've updated that regex a few times; hopefully it's at its best now...
(?<=<) # positive lookbehind - start matching at `<` but don't capture it.
[^<>]+ # any char except `<` or `>`, the `+` meaning one-or-more greedy.
(?=>) # positive lookahead - only succeed if there's a `>` but don't capture it.

I've never been happy with the regular expression matching functions in CF. Hence, I wrote my own:
<cfscript>
function reFindNoSuck(string pattern, string data, numeric startPos = 1){
var sucky = refindNoCase(pattern, data, startPos, true);
var i = 0;
var awesome = [];
if (not isArray(sucky.len) or arrayLen(sucky.len) eq 0){return [];} //handle no match at all
for(i=1; i<= arrayLen(sucky.len); i++){
//if there's a match with pos 0 & length 0, that means the mime type was not specified
if (sucky.len[i] gt 0 && sucky.pos[i] gt 0){
//don't include the group that matches the entire pattern
var matchBody = mid( data, sucky.pos[i], sucky.len[i]);
if (matchBody neq arguments.data){
arrayAppend( awesome, matchBody );
}
}
}
return awesome;
}
</cfscript>
Applied to your problem, here is my example:
<cfset origString = "joe smith <joesmith#domain.com>" />
<cfset regex = "<([^>]+)>" />
<cfset matches = reFindNoSuck(regex, origString) />
Dumping the "matches" variable shows that it is an array with 2 items. The first will be <joesmith#domain.com> (because it matches the entire regex) and the second will be joesmith#domain.com (because it matches the 1st group defined in the regular expression -- all subsequent groups would also be captured and included in the array).

/\<([^>]+)\>$/
something like that, didn't test it though, that one's yours ;)

Related

Coldfusion Regex

I have a HUGE string list (I'll limit the example to one line) with the format:
"[First Name] [Last Name] <[email address]>"
I ran a regular expression on the string to delimit this into an array.
<cfset x = REMatch("<(.*?)>",list) />
This works fine, except that it also returns the angular brackets <> around the email address
x[1] = <[email address]>
Just for simplicity sake because the cfdocs are quite regex ambiguous, I wrote this loop to remove the first and last character of each index..
<cfloop from="1" to="#arrayLen(x)#" index="y">
<cfset a = #RemoveChars(x[y], 1, 1)# />
<cfset a = #left(a,len(a)-1)# />
<cfset x[y] = a />
</cfloop>
This works fine, yay, I have my array now.
However, it's not what I wanted. How can I return the email address WITHOUT the angular brackets included in the first place?
(Please note I also tried REReplace and REFind only returns the index of the occurrance)
(Also note that there are no [] brackets in the string, that's just saying the value inside isn't what I posted here)
Coldfusion implements a regex flavor (Apache ORO, see this answer for details) that doesn't support lookbehind assertions which would be useful in this case.
But we can arrive at an approximation:
<cfset x = REMatch("[^<>]+(?=>)",list) />
should work as long as all angle brackets occur in unnested, balanced pairs.
Explanation:
[^<>]+ # Match one or more characters except angle brackets
(?=>) # Make sure the next character is a closing angle bracket

Find Multiple Occurrences in String using REFind

I'm trying to get ColdFusion's REFindNoCase function to return multiple instances of a matching string but can't seem to get it to work:
<cfset string2test="cfid skldfjskdl cfid sdlkfjslfjs cftoken dslkfjdslfjks cftoken">
<cfset CookieCheck = REFindNoCase( 'CFTOKEN', string2test, 1, true)>
<cfif arrayLen( CookieCheck['LEN'] ) gt 1>
MULTIPLE CFTOKEN!
</cfif>
Is there a regular expression magic syntax I need to use to make it search for more than 1?
The syntax of the code above will create structure with arrays (LEN,POS) for pattern match and subexpressions. RegEx subexpressions are within parenthesis in the pattern. The 'CFTOKEN' pattern does not contain a subexpression.
I don't think that REFindNoCase will do what you are wanting to accomplish.
For example, if you use '.*?(cftoken)' as the pattern:
<cfset CookieCheck = REFindNoCase('.*?(CFTOKEN)', string2test, 1, true)>
(CFTOKEN) is a subexpression. If you remove the "?", the information for the last occurrence of 'cftoken' is returned.
The values in the first array items will match the entire pattern up to the first 'cftoken' (the first 40 characters of the string). The second set of values will identify the 'cftoken' string that was found in first match (first 40 chars).
Because the statement in the example does not include subexpressions, only the first pattern match is returned.
If you need to check to see if something is listed multiple times or you don't need to manipulate the original string, I would recommend using REMatchNoCase(). It returns an array of pattern matches but without the position info.
You could create a custom method to loop over the string, and toss each occurrence into an array (or struct, or whatever you want). Here's an example of how I might approach it:
<cfscript>
public array function reFindMatches(required string regex, required string str) {
var start = 1;
var result = [];
var matches = [];
var match = '';
do {
matches = ReFind(arguments.regex, arguments.str, start, true);
if ( matches.pos[1] ) {
match = matches.len[1] ? Mid(arguments.str, matches.pos[1], matches.len[1]) : '';
ArrayAppend(result, match);
start = matches.pos[1] + matches.len[1];
}
} while(matches.pos[1]);
return result;
}
testString = 'cfid skldfjskdl cfid sdlkfjslfjs cftoken dslkfjdslfjks cftoken';
regex = '(?i)(\bcftoken\b)';
check = reFindMatches(regex=regex, str=testString);
WriteDump(var=check);
</cfscript>
The sample regex I've included begins with (?i) which indicates that the search is case insensitive. So, it's not necessary to call ReFindNoCase ... you can simply pass in whatever regex you wish to use.
The code above should output an array with two elements containing the word cftoken.
If you need to count how many instances there are, use rematch (or rematchNoCase).
If all you need is to identify if there's more than one, you can do this:
<cfset FirstInstance = refindNoCase( 'cftoken' , string2test ) />
<cfif FirstInstance AND refindNoCase( 'cftoken' , string2test , FirstInstance+7 ) >
... more than one instance ...
</cfif>
Which is probably more efficient than rematch, using sub-expressions or looping multiple times.
Depending on the data concerned, it might be even more efficient to do something like:
<bfif string2text.indexOf('cftoken') NEQ string2text.lastIndexOf('cftoken') >
(i.e. if you know that an additional instances would always near the end of the string, whilst initial ones are not.)

Regex to find length in description

<cfset RegexToFindLength = "Length:.*?(\d*\.?\d+)\s*(""|")"/>
<cfset Description = "blah blah blah 2.5"""/>
<cfset size = #reMatch(RegexToFindLength, Description)# />
<cfdump var="#size#">
Error Message: ColdFusion was looking at the following text:
)
looking to extract Length: 2.5" from the products description.
I have tested the above regex expression in regexpal and it works. But when i try using it in a cfm page, i get errors.
Can someone explain to me how this would be setup in CF?
You have a few issues here.
1) You don't escape your double quotes, so you end up closing your regex string and confusing it.
Personally, when I have to use double quotes in a string, I tend to use single quotes to define the string if I can.
<cfset RegexToFindLength = 'Length:.*?(\d*\.?\d+)\s*(""|")'/>
2) Your Description variable doesn't have the string you're searching for, so there will be no match. I changed this to the following to make it work (note the single quotes for defining the string):
<cfset Description = 'Length:.:2.5""'/>
3) (maybe not an issue) Size is not being set to a number. rematch returns an array of strings. You'll want to check the length of the string inside the array positions or check the length of the array itself - I don't know what exactly it is that you want to do.

Regular Expression Excluding a Start String

I am using a regular expression like
<cfset a = ReFind("DESCRIBE\+[^>]*>", myResult.Header, 1, true) />
If I need that this Regular Expression should not include DESCRIBE+ in calculating the LEN and Position values. How should I write it?
DESCRIBE+ is 9 characters, cant you just add / subtract this number and do math.
<cfset a = ReFind("DESCRIBE\+([^>]*>)", myResult.Header, 1, true) />
<cfif ArrayLen(a.pos) GT 1><!--- match found! --->
<cfset afterDescribePosition = a.pos[2]>
<cfset afterDescribeLength = a.len[2]>
</cfif>
ReFind (when the fourth param is set to true, as you have done) will return a structure with two values (pos and len). Each of these is an array. If you don't have any capture groups ( parenthesis) within your regex, then both of these arrays will be just one value long - representing the full regex match. If you have capture groups defined (as I do in my example), then the subsequent values in each array will correspond with the respective capture group. In my example, there is only one capture group, so each array will be of length 2 (assuming there is a match). The values at the second position will therefore relate to the first capture group.
ReFind
If ColdFusion supports look behind, then you can use (?<=DESCRIBE\+)[^>]*>

How do I match a string that does not contain X with ColdFusion regular expressions?

I asked this question earlier, but it got a negative vote, so I'm rewording it.
I have:
<cfset myExpression = "X">
#REFind(myExpression,myString)#
I need to change myExpression so that it returns a value other than zero if there is NOT an X in myString, and a 0 if there is an X in myString.
<cfset string = "abc" />
<cfoutput>#refind( "^[^X]+$" , string )#</cfoutput> // 1
<cfset string = "abcX" />
<cfoutput>#refind( "^[^X]+$" , string )#</cfoutput> // 0
I am building a validation table
Well, the first thing to check is that you're not re-inventing the wheel - the isValid function can validate a variety of types (creditcard,email,zipcode,etc).
It also provides a way to match against a regex pattern, like this:
<cfif isValid('regex',String,RegexPattern) >
Something to be aware of: the documentation for isValid claims that it uses JavaScript regex, which (if true) is different to the default Apache ORO regex that CF uses for everything else.
For the direct regex version of what you were doing (which does use Apache ORO), you would use:
<cfif refind(RegexPattern,String) >
It's not clear what you're on about with your returnValue bit, though if you're returning a boolean from a function, ditch the cfif and just do one of these instead:
<cfreturn isValid('regex',String,RegexPattern) />
<cfreturn refind(RegexPattern,String) />
if your expression is always a character or set of characters then you want
<cfset myExpression ="[^X]">