Regular Expression Excluding a Start String - regex

I am using a regular expression like
<cfset a = ReFind("DESCRIBE\+[^>]*>", myResult.Header, 1, true) />
If I need that this Regular Expression should not include DESCRIBE+ in calculating the LEN and Position values. How should I write it?

DESCRIBE+ is 9 characters, cant you just add / subtract this number and do math.

<cfset a = ReFind("DESCRIBE\+([^>]*>)", myResult.Header, 1, true) />
<cfif ArrayLen(a.pos) GT 1><!--- match found! --->
<cfset afterDescribePosition = a.pos[2]>
<cfset afterDescribeLength = a.len[2]>
</cfif>
ReFind (when the fourth param is set to true, as you have done) will return a structure with two values (pos and len). Each of these is an array. If you don't have any capture groups ( parenthesis) within your regex, then both of these arrays will be just one value long - representing the full regex match. If you have capture groups defined (as I do in my example), then the subsequent values in each array will correspond with the respective capture group. In my example, there is only one capture group, so each array will be of length 2 (assuming there is a match). The values at the second position will therefore relate to the first capture group.
ReFind

If ColdFusion supports look behind, then you can use (?<=DESCRIBE\+)[^>]*>

Related

CF Regex REFind() substring without quotes

My CF backend has to read through a CFM file as if it was a TEXT file to extract the names and values of different parameters, the data looks like this :
request.config.MY_PARAM_1 = 'ABCDEFGHI';
request.config.MY_PARAM_2 = "BlaBlaBla";
request.config.MY_PARAM_3 = TRUE;
request.config.MY_PARAM_4 = 'true';
request.config.MY_PARAM_5 = "1337";
request.config.MY_PARAM_6 = 1337;
As you can see, I can have STRINGS which can be SINGLE or DOUBLE quoted.
I also have BOOLEANS and NUMBERS which usually are without quotes, but that can also have (single or double).
I am "parsing" the file and extracting the values, I want to find a pattern that would return the matches like this :
request.config.MY_PARAM_2 = "BlaBlaBla";
I am VERY close to succeeding, but unfortunately the following expression cannot get rid of the closing quote.
<cfset match = REFind("^request\.config\.(\S+) = ['|""]?(.*)['|""]?;$", str, 1, "Yes")>
<cfset paramVal = Mid( str, match.pos[3], match.len[3] ) >
<cfdump var=#paramVal# >
For example, it returns BlaBlaBla", it has successfully omitted the opening quote, but not the last one, what am I doing wrong?
From your comments, it sounds like you're saying that you want to parse two ARBITRARY lines. This will do it:
^(?:[^\n]*\n){1}request\.config.(\w+)\s*=\s*(['"]?)(\w+)\2;(?:[^\n]*\n){4}request\.config.(\w+)\s*=\s*(['"]?)(\w+)\5;
In your code, just change the two numbers in the quantifiers: {1} and {4} as they specify how many lines to skip at the top and in the middle. For line 1, for instance you would have {0} in the first quantifier.
The data you want is in Groups 1, 3, 4 and 5. Please see the capture groups in the lower right panel of this demo
I am sure you will have no trouble building the regex in code by concatenating the pieces:
method Parse(x,y)
Build the regex by concatenating
^(?:[^\n]*\n){
With
x-1
With
}request\.config.(\w+)\s*=\s*(['"]?)(\w+)\2;(?:[^\n]*\n){
With
y-x
With
}request\.config.(\w+)\s*=\s*(['"]?)(\w+)\5;
Then match and retrieve Groups 1, 3, 4 and 5
Also see this visualization which makes it quite clear.
Debuggex Demo

Coldfusion Regex

I have a HUGE string list (I'll limit the example to one line) with the format:
"[First Name] [Last Name] <[email address]>"
I ran a regular expression on the string to delimit this into an array.
<cfset x = REMatch("<(.*?)>",list) />
This works fine, except that it also returns the angular brackets <> around the email address
x[1] = <[email address]>
Just for simplicity sake because the cfdocs are quite regex ambiguous, I wrote this loop to remove the first and last character of each index..
<cfloop from="1" to="#arrayLen(x)#" index="y">
<cfset a = #RemoveChars(x[y], 1, 1)# />
<cfset a = #left(a,len(a)-1)# />
<cfset x[y] = a />
</cfloop>
This works fine, yay, I have my array now.
However, it's not what I wanted. How can I return the email address WITHOUT the angular brackets included in the first place?
(Please note I also tried REReplace and REFind only returns the index of the occurrance)
(Also note that there are no [] brackets in the string, that's just saying the value inside isn't what I posted here)
Coldfusion implements a regex flavor (Apache ORO, see this answer for details) that doesn't support lookbehind assertions which would be useful in this case.
But we can arrive at an approximation:
<cfset x = REMatch("[^<>]+(?=>)",list) />
should work as long as all angle brackets occur in unnested, balanced pairs.
Explanation:
[^<>]+ # Match one or more characters except angle brackets
(?=>) # Make sure the next character is a closing angle bracket

Find Multiple Occurrences in String using REFind

I'm trying to get ColdFusion's REFindNoCase function to return multiple instances of a matching string but can't seem to get it to work:
<cfset string2test="cfid skldfjskdl cfid sdlkfjslfjs cftoken dslkfjdslfjks cftoken">
<cfset CookieCheck = REFindNoCase( 'CFTOKEN', string2test, 1, true)>
<cfif arrayLen( CookieCheck['LEN'] ) gt 1>
MULTIPLE CFTOKEN!
</cfif>
Is there a regular expression magic syntax I need to use to make it search for more than 1?
The syntax of the code above will create structure with arrays (LEN,POS) for pattern match and subexpressions. RegEx subexpressions are within parenthesis in the pattern. The 'CFTOKEN' pattern does not contain a subexpression.
I don't think that REFindNoCase will do what you are wanting to accomplish.
For example, if you use '.*?(cftoken)' as the pattern:
<cfset CookieCheck = REFindNoCase('.*?(CFTOKEN)', string2test, 1, true)>
(CFTOKEN) is a subexpression. If you remove the "?", the information for the last occurrence of 'cftoken' is returned.
The values in the first array items will match the entire pattern up to the first 'cftoken' (the first 40 characters of the string). The second set of values will identify the 'cftoken' string that was found in first match (first 40 chars).
Because the statement in the example does not include subexpressions, only the first pattern match is returned.
If you need to check to see if something is listed multiple times or you don't need to manipulate the original string, I would recommend using REMatchNoCase(). It returns an array of pattern matches but without the position info.
You could create a custom method to loop over the string, and toss each occurrence into an array (or struct, or whatever you want). Here's an example of how I might approach it:
<cfscript>
public array function reFindMatches(required string regex, required string str) {
var start = 1;
var result = [];
var matches = [];
var match = '';
do {
matches = ReFind(arguments.regex, arguments.str, start, true);
if ( matches.pos[1] ) {
match = matches.len[1] ? Mid(arguments.str, matches.pos[1], matches.len[1]) : '';
ArrayAppend(result, match);
start = matches.pos[1] + matches.len[1];
}
} while(matches.pos[1]);
return result;
}
testString = 'cfid skldfjskdl cfid sdlkfjslfjs cftoken dslkfjdslfjks cftoken';
regex = '(?i)(\bcftoken\b)';
check = reFindMatches(regex=regex, str=testString);
WriteDump(var=check);
</cfscript>
The sample regex I've included begins with (?i) which indicates that the search is case insensitive. So, it's not necessary to call ReFindNoCase ... you can simply pass in whatever regex you wish to use.
The code above should output an array with two elements containing the word cftoken.
If you need to count how many instances there are, use rematch (or rematchNoCase).
If all you need is to identify if there's more than one, you can do this:
<cfset FirstInstance = refindNoCase( 'cftoken' , string2test ) />
<cfif FirstInstance AND refindNoCase( 'cftoken' , string2test , FirstInstance+7 ) >
... more than one instance ...
</cfif>
Which is probably more efficient than rematch, using sub-expressions or looping multiple times.
Depending on the data concerned, it might be even more efficient to do something like:
<bfif string2text.indexOf('cftoken') NEQ string2text.lastIndexOf('cftoken') >
(i.e. if you know that an additional instances would always near the end of the string, whilst initial ones are not.)

Replace Extra Zero Coldfusion Cfset

I am very new to Coldfusion and not sure what the format should be to use this function correctly.
I want to convert 0000411111 to 0411111 get rid of the first three zeros
<cfset origValue = "#query.column#">
<cfset newValue = ReReplace(origValue, "0+", "", "all")>
<cfoutput>#newValue#</cfoutput>
This removes all zeros is there anyway to just keep one zero. Just curious.
Thanks in advance for your assistances.
If the string will always be 7 characters you can use
<cfset newValue = numberFormat(000411111,'0000000')>
If you don't know the length and always want to remove leading 0's and leave one at the begining you can do
<cfset newValue = '0' & int(000411111)>
If you always want to remove the first three characters, you can use the right() function:
<cfset newValue = right(query.column, len(query.column)-3>
This will return all the characters from the right side of the string without the leading three characters.
You could do it 2 different ways:
<Cfset newvalue=right(origvalue,len(origvalue)-3>
This method returns the string without the left 3 most characters
or
<Cfset newvalue=mid(origvalue,4,len(origvalue)-3>
this method starts at position 4 and grabs the rest of the string.
I think the numberFormat() answer is the best one, but other people have been suggesting using mid() and right() which I think - whilst those approaches work - are more cumbersome than you need to make it. If you simply wish to remove the first three chars of the string, there's a removeChars() function. It's unclear from your question though whether this actually achieves what you want: if it's only when the number is left-padded with too many zeros you want to do this, then the numberFormat() approach is best. If it's any three characters, then this approach is better.
newValue = removeChars(origValue, 1, 3);
The regex string you are looking for really is for matching 2 or more 0's at the start of the string, and replacing them with simply a single 0.
This gives the regex ^0+0
^ matches the start of the string, 0+ matches 1 or more 0's, 0 matches the second zero. This will mean that if there is only 1 leading zero then it won't need to do anything. Finally you only need to do this once, as you are only replacing the ones at the start of the string. This brings to the CF code
newValue = ReReplace(origValue, "^0+0", "0", "one")
This should replace multiple leading zeros with a single one, while not adding zeros where there weren't any to begin with.
As a final note, a good place to play around with regex is http://gskinner.com/RegExr/

regex find content question

Trying to use regex refind tag to find the content within the brackets in this example using coldfusion
joe smith <joesmith#domain.com>
The resulting text should be
joesmith#domain.com
Using this
<cfset reg = refind(
"/(?<=\<).*?(?=\>)/s","Joe <joe#domain.com>") />
Not having any luck. Any suggestions?
Maybe a syntax issue, it works in an online regex tester I use.
You can't use lookbehind with CF's regex engine (uses Apache Jakarta ORO).
However, you can use Java's regex though, which does support them, and I've created a wrapper CFC that makes this even easier. Available from:
http://www.hybridchill.com/projects/jre-utils.html
(Update: The wrapper CFC mentioned above has evolved into a full project. See cfregex.net for details.)
Also, the /.../s stuff isn't required/relevant here.
So, from your example, but with improved regex:
<cfset jrex = createObject('component','jre-utils').init()/>
<cfset reg = jrex.match( "(?<=<)[^<>]+(?=>)" , "Joe <joe#domain.com>" ) />
A quick note, since I've updated that regex a few times; hopefully it's at its best now...
(?<=<) # positive lookbehind - start matching at `<` but don't capture it.
[^<>]+ # any char except `<` or `>`, the `+` meaning one-or-more greedy.
(?=>) # positive lookahead - only succeed if there's a `>` but don't capture it.
I've never been happy with the regular expression matching functions in CF. Hence, I wrote my own:
<cfscript>
function reFindNoSuck(string pattern, string data, numeric startPos = 1){
var sucky = refindNoCase(pattern, data, startPos, true);
var i = 0;
var awesome = [];
if (not isArray(sucky.len) or arrayLen(sucky.len) eq 0){return [];} //handle no match at all
for(i=1; i<= arrayLen(sucky.len); i++){
//if there's a match with pos 0 & length 0, that means the mime type was not specified
if (sucky.len[i] gt 0 && sucky.pos[i] gt 0){
//don't include the group that matches the entire pattern
var matchBody = mid( data, sucky.pos[i], sucky.len[i]);
if (matchBody neq arguments.data){
arrayAppend( awesome, matchBody );
}
}
}
return awesome;
}
</cfscript>
Applied to your problem, here is my example:
<cfset origString = "joe smith <joesmith#domain.com>" />
<cfset regex = "<([^>]+)>" />
<cfset matches = reFindNoSuck(regex, origString) />
Dumping the "matches" variable shows that it is an array with 2 items. The first will be <joesmith#domain.com> (because it matches the entire regex) and the second will be joesmith#domain.com (because it matches the 1st group defined in the regular expression -- all subsequent groups would also be captured and included in the array).
/\<([^>]+)\>$/
something like that, didn't test it though, that one's yours ;)