Find Multiple Occurrences in String using REFind

Find Multiple Occurrences in String using REFind - coldfusion

I'm trying to get ColdFusion's REFindNoCase function to return multiple instances of a matching string but can't seem to get it to work:
<cfset string2test="cfid skldfjskdl cfid sdlkfjslfjs cftoken dslkfjdslfjks cftoken">
<cfset CookieCheck = REFindNoCase( 'CFTOKEN', string2test, 1, true)>
<cfif arrayLen( CookieCheck['LEN'] ) gt 1>
MULTIPLE CFTOKEN!
</cfif>
Is there a regular expression magic syntax I need to use to make it search for more than 1?

The syntax of the code above will create structure with arrays (LEN,POS) for pattern match and subexpressions. RegEx subexpressions are within parenthesis in the pattern. The 'CFTOKEN' pattern does not contain a subexpression.
I don't think that REFindNoCase will do what you are wanting to accomplish.
For example, if you use '.*?(cftoken)' as the pattern:
<cfset CookieCheck = REFindNoCase('.*?(CFTOKEN)', string2test, 1, true)>
(CFTOKEN) is a subexpression. If you remove the "?", the information for the last occurrence of 'cftoken' is returned.
The values in the first array items will match the entire pattern up to the first 'cftoken' (the first 40 characters of the string). The second set of values will identify the 'cftoken' string that was found in first match (first 40 chars).
Because the statement in the example does not include subexpressions, only the first pattern match is returned.
If you need to check to see if something is listed multiple times or you don't need to manipulate the original string, I would recommend using REMatchNoCase(). It returns an array of pattern matches but without the position info.

You could create a custom method to loop over the string, and toss each occurrence into an array (or struct, or whatever you want). Here's an example of how I might approach it:
<cfscript>
public array function reFindMatches(required string regex, required string str) {
var start = 1;
var result = [];
var matches = [];
var match = '';
do {
matches = ReFind(arguments.regex, arguments.str, start, true);
if ( matches.pos[1] ) {
match = matches.len[1] ? Mid(arguments.str, matches.pos[1], matches.len[1]) : '';
ArrayAppend(result, match);
start = matches.pos[1] + matches.len[1];
}
} while(matches.pos[1]);
return result;
}
testString = 'cfid skldfjskdl cfid sdlkfjslfjs cftoken dslkfjdslfjks cftoken';
regex = '(?i)(\bcftoken\b)';
check = reFindMatches(regex=regex, str=testString);
WriteDump(var=check);
</cfscript>
The sample regex I've included begins with (?i) which indicates that the search is case insensitive. So, it's not necessary to call ReFindNoCase ... you can simply pass in whatever regex you wish to use.
The code above should output an array with two elements containing the word cftoken.

If you need to count how many instances there are, use rematch (or rematchNoCase).
If all you need is to identify if there's more than one, you can do this:
<cfset FirstInstance = refindNoCase( 'cftoken' , string2test ) />
<cfif FirstInstance AND refindNoCase( 'cftoken' , string2test , FirstInstance+7 ) >
... more than one instance ...
</cfif>
Which is probably more efficient than rematch, using sub-expressions or looping multiple times.
Depending on the data concerned, it might be even more efficient to do something like:
<bfif string2text.indexOf('cftoken') NEQ string2text.lastIndexOf('cftoken') >
(i.e. if you know that an additional instances would always near the end of the string, whilst initial ones are not.)

Related

Finding just one or more comma in a string with ColdFusion

How can I detect if a string value only consists of one or more commas and nothing else. The correct value should be something like: ABC,BVC,BNM but sometimes I get value like: , or ,,, or ,, and this is not allowed. How can I detect that a string only have one or more commas and then I can create a warning to the user and stop the process.
Thank you

You can use listToArray() and arrayToList() to remove the empty items from the list and can then compare the sanitized version with the original like this:
<cfset originalInput = trim( ",,," )>
<cfset sanitizedInput = arrayToList( listToArray( originalInput, ",", false ), "," )>
<!--- Compare both --->
<cfif originalInput NEQ sanitizedInput>
<!--- Throw error --->
</cfif>

Depends on how much your input may vary, but as you currently describe it:
Something as simple as <cfif MyVar contains ",,"> would work.
If one comma (and nothing else) is a possibility, then
<cfif MyVar contains ",," OR Len(MyVar) lt 2>

Assuming any non-commas are either letters or numbers, you can do use a regular expression:
patternAlphaNumeric = "[0-9a-zA-Z]";
testString = ",,,";
if (reFind(",", testString) > 0 && refind(patternAlphaNumeric, testString) == 0)
code for all commas
else
code for other characters

If you are only concerned about detecting commas (or one or more of any character), just use ListLen().
It's a native ColdFusion function.
It ignores empty list items by default.
Its default delimiter is a comma.
So, if your_string consists only of one or more commas, then ListLen( your_string ) will always return 0.
Heads up, it also returns 0 for an empty string, so if you don't want your code to pop for empty strings, be sure to account for that.

<cfset local.myString = "string-goes-here">
<cfset local.myNewString = ReReplace(trim(local.myString),",","","ALL")>
<cfif not len(local.myNewString)>
<!--- warning to the user and stop the process --->
</cfif>

CF Regex REFind() substring without quotes

My CF backend has to read through a CFM file as if it was a TEXT file to extract the names and values of different parameters, the data looks like this :
request.config.MY_PARAM_1 = 'ABCDEFGHI';
request.config.MY_PARAM_2 = "BlaBlaBla";
request.config.MY_PARAM_3 = TRUE;
request.config.MY_PARAM_4 = 'true';
request.config.MY_PARAM_5 = "1337";
request.config.MY_PARAM_6 = 1337;
As you can see, I can have STRINGS which can be SINGLE or DOUBLE quoted.
I also have BOOLEANS and NUMBERS which usually are without quotes, but that can also have (single or double).
I am "parsing" the file and extracting the values, I want to find a pattern that would return the matches like this :
request.config.MY_PARAM_2 = "BlaBlaBla";
I am VERY close to succeeding, but unfortunately the following expression cannot get rid of the closing quote.
<cfset match = REFind("^request\.config\.(\S+) = ['|""]?(.*)['|""]?;$", str, 1, "Yes")>
<cfset paramVal = Mid( str, match.pos[3], match.len[3] ) >
<cfdump var=#paramVal# >
For example, it returns BlaBlaBla", it has successfully omitted the opening quote, but not the last one, what am I doing wrong?

From your comments, it sounds like you're saying that you want to parse two ARBITRARY lines. This will do it:
^(?:[^\n]*\n){1}request\.config.(\w+)\s*=\s*(['"]?)(\w+)\2;(?:[^\n]*\n){4}request\.config.(\w+)\s*=\s*(['"]?)(\w+)\5;
In your code, just change the two numbers in the quantifiers: {1} and {4} as they specify how many lines to skip at the top and in the middle. For line 1, for instance you would have {0} in the first quantifier.
The data you want is in Groups 1, 3, 4 and 5. Please see the capture groups in the lower right panel of this demo
I am sure you will have no trouble building the regex in code by concatenating the pieces:
method Parse(x,y)
Build the regex by concatenating
^(?:[^\n]*\n){
With
x-1
With
}request\.config.(\w+)\s*=\s*(['"]?)(\w+)\2;(?:[^\n]*\n){
With
y-x
With
}request\.config.(\w+)\s*=\s*(['"]?)(\w+)\5;
Then match and retrieve Groups 1, 3, 4 and 5
Also see this visualization which makes it quite clear.
Debuggex Demo

Regex to find length in description

<cfset RegexToFindLength = "Length:.*?(\d*\.?\d+)\s*(""|")"/>
<cfset Description = "blah blah blah 2.5"""/>
<cfset size = #reMatch(RegexToFindLength, Description)# />
<cfdump var="#size#">
Error Message: ColdFusion was looking at the following text:
)
looking to extract Length: 2.5" from the products description.
I have tested the above regex expression in regexpal and it works. But when i try using it in a cfm page, i get errors.
Can someone explain to me how this would be setup in CF?

You have a few issues here.
1) You don't escape your double quotes, so you end up closing your regex string and confusing it.
Personally, when I have to use double quotes in a string, I tend to use single quotes to define the string if I can.
<cfset RegexToFindLength = 'Length:.*?(\d*\.?\d+)\s*(""|")'/>
2) Your Description variable doesn't have the string you're searching for, so there will be no match. I changed this to the following to make it work (note the single quotes for defining the string):
<cfset Description = 'Length:.:2.5""'/>
3) (maybe not an issue) Size is not being set to a number. rematch returns an array of strings. You'll want to check the length of the string inside the array positions or check the length of the array itself - I don't know what exactly it is that you want to do.

Regular Expression Excluding a Start String

I am using a regular expression like
<cfset a = ReFind("DESCRIBE\+[^>]*>", myResult.Header, 1, true) />
If I need that this Regular Expression should not include DESCRIBE+ in calculating the LEN and Position values. How should I write it?

DESCRIBE+ is 9 characters, cant you just add / subtract this number and do math.

<cfset a = ReFind("DESCRIBE\+([^>]*>)", myResult.Header, 1, true) />
<cfif ArrayLen(a.pos) GT 1><!--- match found! --->
<cfset afterDescribePosition = a.pos[2]>
<cfset afterDescribeLength = a.len[2]>
</cfif>
ReFind (when the fourth param is set to true, as you have done) will return a structure with two values (pos and len). Each of these is an array. If you don't have any capture groups ( parenthesis) within your regex, then both of these arrays will be just one value long - representing the full regex match. If you have capture groups defined (as I do in my example), then the subsequent values in each array will correspond with the respective capture group. In my example, there is only one capture group, so each array will be of length 2 (assuming there is a match). The values at the second position will therefore relate to the first capture group.
ReFind

If ColdFusion supports look behind, then you can use (?<=DESCRIBE\+)[^>]*>

regex find content question

Trying to use regex refind tag to find the content within the brackets in this example using coldfusion
joe smith <joesmith#domain.com>
The resulting text should be
joesmith#domain.com
Using this
<cfset reg = refind(
"/(?<=\<).*?(?=\>)/s","Joe <joe#domain.com>") />
Not having any luck. Any suggestions?
Maybe a syntax issue, it works in an online regex tester I use.

You can't use lookbehind with CF's regex engine (uses Apache Jakarta ORO).
However, you can use Java's regex though, which does support them, and I've created a wrapper CFC that makes this even easier. Available from:
http://www.hybridchill.com/projects/jre-utils.html
(Update: The wrapper CFC mentioned above has evolved into a full project. See cfregex.net for details.)
Also, the /.../s stuff isn't required/relevant here.
So, from your example, but with improved regex:
<cfset jrex = createObject('component','jre-utils').init()/>
<cfset reg = jrex.match( "(?<=<)[^<>]+(?=>)" , "Joe <joe#domain.com>" ) />
A quick note, since I've updated that regex a few times; hopefully it's at its best now...
(?<=<) # positive lookbehind - start matching at `<` but don't capture it.
[^<>]+ # any char except `<` or `>`, the `+` meaning one-or-more greedy.
(?=>) # positive lookahead - only succeed if there's a `>` but don't capture it.

I've never been happy with the regular expression matching functions in CF. Hence, I wrote my own:
<cfscript>
function reFindNoSuck(string pattern, string data, numeric startPos = 1){
var sucky = refindNoCase(pattern, data, startPos, true);
var i = 0;
var awesome = [];
if (not isArray(sucky.len) or arrayLen(sucky.len) eq 0){return [];} //handle no match at all
for(i=1; i<= arrayLen(sucky.len); i++){
//if there's a match with pos 0 & length 0, that means the mime type was not specified
if (sucky.len[i] gt 0 && sucky.pos[i] gt 0){
//don't include the group that matches the entire pattern
var matchBody = mid( data, sucky.pos[i], sucky.len[i]);
if (matchBody neq arguments.data){
arrayAppend( awesome, matchBody );
}
}
}
return awesome;
}
</cfscript>
Applied to your problem, here is my example:
<cfset origString = "joe smith <joesmith#domain.com>" />
<cfset regex = "<([^>]+)>" />
<cfset matches = reFindNoSuck(regex, origString) />
Dumping the "matches" variable shows that it is an array with 2 items. The first will be <joesmith#domain.com> (because it matches the entire regex) and the second will be joesmith#domain.com (because it matches the 1st group defined in the regular expression -- all subsequent groups would also be captured and included in the array).

/\<([^>]+)\>$/
something like that, didn't test it though, that one's yours ;)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find Multiple Occurrences in String using REFind - coldfusion

Related

Finding just one or more comma in a string with ColdFusion

CF Regex REFind() substring without quotes

Regex to find length in description

Regular Expression Excluding a Start String

regex find content question

Categories

Resources