Coldfusion Regex - regex

I have a HUGE string list (I'll limit the example to one line) with the format:
"[First Name] [Last Name] <[email address]>"
I ran a regular expression on the string to delimit this into an array.
<cfset x = REMatch("<(.*?)>",list) />
This works fine, except that it also returns the angular brackets <> around the email address
x[1] = <[email address]>
Just for simplicity sake because the cfdocs are quite regex ambiguous, I wrote this loop to remove the first and last character of each index..
<cfloop from="1" to="#arrayLen(x)#" index="y">
<cfset a = #RemoveChars(x[y], 1, 1)# />
<cfset a = #left(a,len(a)-1)# />
<cfset x[y] = a />
</cfloop>
This works fine, yay, I have my array now.
However, it's not what I wanted. How can I return the email address WITHOUT the angular brackets included in the first place?
(Please note I also tried REReplace and REFind only returns the index of the occurrance)
(Also note that there are no [] brackets in the string, that's just saying the value inside isn't what I posted here)

Coldfusion implements a regex flavor (Apache ORO, see this answer for details) that doesn't support lookbehind assertions which would be useful in this case.
But we can arrive at an approximation:
<cfset x = REMatch("[^<>]+(?=>)",list) />
should work as long as all angle brackets occur in unnested, balanced pairs.
Explanation:
[^<>]+ # Match one or more characters except angle brackets
(?=>) # Make sure the next character is a closing angle bracket

Related

Getting string between two characters - Coldfusion

I'm struggling a bit with ColdFusion (not the language I ever write in).
I am trying to do a regex to get a part of a string.
So for example, if my string is: D_CECILA23_CEC23423
I want the part that is between the 2 underscores.
This is the code I have so far, and it works for anything that is alpha characters, but when a number is thrown into the mix, it just breaks.
<cfset myStr = "D_CELCI_LISA">
<cfset myStr2 = reReplace(myStr, "([\w\d\%]+)(\_)([/ A-Z]+)(\_)([\w\d\?]+)", "\3", "all") >
<cfoutput>
myStr: #myStr#<br />
myStr2: #myStr2#<br />
</cfoutput>
Which gives me:
myStr: D_CELCI_LISA
myStr2: CELCI
If it really is as simple as getting the text between the first and second underscore character, you don't need a regex. This'll do it:
myStr2 = listGetAt(myStr, 2, "_");
That said, this should do for the regex in that context: ^.*_([^_]+)_.*$, eg:
myStr2 = reReplace(myStr, "^.*_([^_]+)_.*$", "\1", "all");
#user2429578 ListLast() and ListFirst() for the last or first element of a list.

Regex to find length in description

<cfset RegexToFindLength = "Length:.*?(\d*\.?\d+)\s*(""|")"/>
<cfset Description = "blah blah blah 2.5"""/>
<cfset size = #reMatch(RegexToFindLength, Description)# />
<cfdump var="#size#">
Error Message: ColdFusion was looking at the following text:
)
looking to extract Length: 2.5" from the products description.
I have tested the above regex expression in regexpal and it works. But when i try using it in a cfm page, i get errors.
Can someone explain to me how this would be setup in CF?
You have a few issues here.
1) You don't escape your double quotes, so you end up closing your regex string and confusing it.
Personally, when I have to use double quotes in a string, I tend to use single quotes to define the string if I can.
<cfset RegexToFindLength = 'Length:.*?(\d*\.?\d+)\s*(""|")'/>
2) Your Description variable doesn't have the string you're searching for, so there will be no match. I changed this to the following to make it work (note the single quotes for defining the string):
<cfset Description = 'Length:.:2.5""'/>
3) (maybe not an issue) Size is not being set to a number. rematch returns an array of strings. You'll want to check the length of the string inside the array positions or check the length of the array itself - I don't know what exactly it is that you want to do.

Regular Expression Excluding a Start String

I am using a regular expression like
<cfset a = ReFind("DESCRIBE\+[^>]*>", myResult.Header, 1, true) />
If I need that this Regular Expression should not include DESCRIBE+ in calculating the LEN and Position values. How should I write it?
DESCRIBE+ is 9 characters, cant you just add / subtract this number and do math.
<cfset a = ReFind("DESCRIBE\+([^>]*>)", myResult.Header, 1, true) />
<cfif ArrayLen(a.pos) GT 1><!--- match found! --->
<cfset afterDescribePosition = a.pos[2]>
<cfset afterDescribeLength = a.len[2]>
</cfif>
ReFind (when the fourth param is set to true, as you have done) will return a structure with two values (pos and len). Each of these is an array. If you don't have any capture groups ( parenthesis) within your regex, then both of these arrays will be just one value long - representing the full regex match. If you have capture groups defined (as I do in my example), then the subsequent values in each array will correspond with the respective capture group. In my example, there is only one capture group, so each array will be of length 2 (assuming there is a match). The values at the second position will therefore relate to the first capture group.
ReFind
If ColdFusion supports look behind, then you can use (?<=DESCRIBE\+)[^>]*>

How do I match a string that does not contain X with ColdFusion regular expressions?

I asked this question earlier, but it got a negative vote, so I'm rewording it.
I have:
<cfset myExpression = "X">
#REFind(myExpression,myString)#
I need to change myExpression so that it returns a value other than zero if there is NOT an X in myString, and a 0 if there is an X in myString.
<cfset string = "abc" />
<cfoutput>#refind( "^[^X]+$" , string )#</cfoutput> // 1
<cfset string = "abcX" />
<cfoutput>#refind( "^[^X]+$" , string )#</cfoutput> // 0
I am building a validation table
Well, the first thing to check is that you're not re-inventing the wheel - the isValid function can validate a variety of types (creditcard,email,zipcode,etc).
It also provides a way to match against a regex pattern, like this:
<cfif isValid('regex',String,RegexPattern) >
Something to be aware of: the documentation for isValid claims that it uses JavaScript regex, which (if true) is different to the default Apache ORO regex that CF uses for everything else.
For the direct regex version of what you were doing (which does use Apache ORO), you would use:
<cfif refind(RegexPattern,String) >
It's not clear what you're on about with your returnValue bit, though if you're returning a boolean from a function, ditch the cfif and just do one of these instead:
<cfreturn isValid('regex',String,RegexPattern) />
<cfreturn refind(RegexPattern,String) />
if your expression is always a character or set of characters then you want
<cfset myExpression ="[^X]">

regex find content question

Trying to use regex refind tag to find the content within the brackets in this example using coldfusion
joe smith <joesmith#domain.com>
The resulting text should be
joesmith#domain.com
Using this
<cfset reg = refind(
"/(?<=\<).*?(?=\>)/s","Joe <joe#domain.com>") />
Not having any luck. Any suggestions?
Maybe a syntax issue, it works in an online regex tester I use.
You can't use lookbehind with CF's regex engine (uses Apache Jakarta ORO).
However, you can use Java's regex though, which does support them, and I've created a wrapper CFC that makes this even easier. Available from:
http://www.hybridchill.com/projects/jre-utils.html
(Update: The wrapper CFC mentioned above has evolved into a full project. See cfregex.net for details.)
Also, the /.../s stuff isn't required/relevant here.
So, from your example, but with improved regex:
<cfset jrex = createObject('component','jre-utils').init()/>
<cfset reg = jrex.match( "(?<=<)[^<>]+(?=>)" , "Joe <joe#domain.com>" ) />
A quick note, since I've updated that regex a few times; hopefully it's at its best now...
(?<=<) # positive lookbehind - start matching at `<` but don't capture it.
[^<>]+ # any char except `<` or `>`, the `+` meaning one-or-more greedy.
(?=>) # positive lookahead - only succeed if there's a `>` but don't capture it.
I've never been happy with the regular expression matching functions in CF. Hence, I wrote my own:
<cfscript>
function reFindNoSuck(string pattern, string data, numeric startPos = 1){
var sucky = refindNoCase(pattern, data, startPos, true);
var i = 0;
var awesome = [];
if (not isArray(sucky.len) or arrayLen(sucky.len) eq 0){return [];} //handle no match at all
for(i=1; i<= arrayLen(sucky.len); i++){
//if there's a match with pos 0 & length 0, that means the mime type was not specified
if (sucky.len[i] gt 0 && sucky.pos[i] gt 0){
//don't include the group that matches the entire pattern
var matchBody = mid( data, sucky.pos[i], sucky.len[i]);
if (matchBody neq arguments.data){
arrayAppend( awesome, matchBody );
}
}
}
return awesome;
}
</cfscript>
Applied to your problem, here is my example:
<cfset origString = "joe smith <joesmith#domain.com>" />
<cfset regex = "<([^>]+)>" />
<cfset matches = reFindNoSuck(regex, origString) />
Dumping the "matches" variable shows that it is an array with 2 items. The first will be <joesmith#domain.com> (because it matches the entire regex) and the second will be joesmith#domain.com (because it matches the 1st group defined in the regular expression -- all subsequent groups would also be captured and included in the array).
/\<([^>]+)\>$/
something like that, didn't test it though, that one's yours ;)