Using a regex to extract a substring - regex

I have the following string:
<cfset foo="Students: Am Goron, rika Mrks, Apl Rirez, Ktsana Tanam Course Location: Training Center - Hillsboro, OR - Hillsboro OR 97124 Course Date/Time: February 03, 2017">
I want to use a regex to extract only the list of students which is:
Am Goron, rika Mrks, Apl Rirez, Ktsana Tanam
If I use replace, I have to use many replace to make it happen. I think it would work with one regex, but I am not good with regular expressions. Can anybody help and guide?

Please pay no attention to the insult someone made in the comments. That's not what SO is for.
Anywho, there are a number of ColdFusion string functions that make your job easier. Here's what I did. This is assuming certain parts of your string will always be the same.
May not be super efficient, but it will help detail step by step what we're doing, and gives you precise control.
<cfset StringVar = "Students: Am Goron, rika Mrks, Apl Rirez, Ktsana Tanam Course Location: Training Center - Hillsboro, OR - Hillsboro OR 97124 Course Date/Time: February 03, 2017">
<!---Set total length of string --->
<cfset LengthIndent = len(StringVar)>
<!---Trim off the Students: part--->
<cfset StringVar = Right(StringVar,LengthIndent-9)>
<!---Trim up to the Course Location: part--->
<cfset StringVar = SpanExcluding(StringVar, ":")>
<!---Set total length of REMAINING string --->
<cfset LengthIndent = len(StringVar)>
<!---Trim off the Course Location: part--->
<cfset StringVar = LEFT(StringVar,LengthIndent-15)>
<!---Outputting this will give you ONLY names of students--->
<cfoutput>#StringVar#</cfoutput>

Regex's are not my strong suit either, but there are online tutorials and test sites like RegExrv2.1 you can use for practicing. With a bit of reading I came up with this:
<cfset list = reReplaceNoCase(text, "^Students:(.*?)Course Location:.*$", "\1", "all")>
Breaking it down, it searches for a string that:
^Students: - starts with "Students:"
(.+?) - followed by one or more characters as a capturing group
Course Location: - followed by the course location
.*$ - ending with zero or more characters
Then use a backreference, i.e. \1 to replace everything except the matched group, ie the student list.
If you prefer non-regex options, you could also cheat (a little) and insert an extra colon, ie : before course location. That would allow you to treat the string as a list delimited by colons, and extract the second element with list functions:
<cfset list = listGetAt( replace(text, "Course Location:", ":Course Location:"), 2, ":")>

Related

How to remove words or chars from a string using coldfusion

I am new to coldfusion and my goal is to remove part of a string according to certain words.
For example:
<cfset myVar = "One of the myths associated with the Great Wall of China is that it is the only man-made structure"/>ยจ
How can I remove the words "One of the myths associated with the" in order to
have
Great Wall of China is that it is the only man-made structure as string?
I used following function
RemoveChars(string, start, count)
But I need to create a function maybe with RegEx or native coldfusion functions.
I see this question already has an accepted answer, but I thought I'd add another answer :)
You can do it by finding where the word 'Great' is in the string. With modern CFML you can do it like so:
<cfscript>
myVar = "One of the myths associated with the Great Wall of China is that it is the only man-made structure";
// where is the word 'Great'?
a = myVar.FindNoCase("Great");
substring = myVar.removeChars(1, a-1);
writeDump(substring);
</cfscript>
Using mid would give you a bit more flexibility if you want to cut chars off both ends.
<cfscript>
myVar = "One of the myths associated with the Great Wall of China is that it is the only man-made structure";
// where is the word 'Great'?
a = myVar.FindNoCase("Great");
// get the substring
substring = myVar.mid(a, myVar.len());
writeDump(substring);
</cfscript>
In older versions of CF that would be written as:
<cfscript>
myVar = "One of the myths associated with the Great Wall of China is that it is the only man-made structure";
// where is the word 'Great'
a = FindNoCase("Great", myVar);
// get the substring
substring = mid(myVar, a, len(myVar));
writeDump(substring);
</cfscript>
You could also use a Regular Expression to achieve the same result, you'll have to decide which is more appropriate in your use case:
<cfscript>
myVar = "One of the myths associated with the Great Wall of China is that it is the only man-made structure";
// strip all chars before 'Great'
substring = myVar.reReplaceNoCase(".+(Great)", "\1");
writeDump(substring);
</cfscript>
You could see the sentence as a list seperated by spaces. So if you want to cut off your sentence to start with "Great Wall of China", you could try
<cfloop list="#myVar#" index="word" delimiters=" ">
<cfif word neq "Great">
<cfset myVar = listRest(#myVar#," ")>
<cfelse>
<cfbreak>
</cfif>
</cfloop>
<cfoutput>#myVar#</cfoutput>
There may be a quicker way to do this. Here's a function at cfLib.org that can alter a list in a similar way: LINK.

Coldfusion Regex

I have a HUGE string list (I'll limit the example to one line) with the format:
"[First Name] [Last Name] <[email address]>"
I ran a regular expression on the string to delimit this into an array.
<cfset x = REMatch("<(.*?)>",list) />
This works fine, except that it also returns the angular brackets <> around the email address
x[1] = <[email address]>
Just for simplicity sake because the cfdocs are quite regex ambiguous, I wrote this loop to remove the first and last character of each index..
<cfloop from="1" to="#arrayLen(x)#" index="y">
<cfset a = #RemoveChars(x[y], 1, 1)# />
<cfset a = #left(a,len(a)-1)# />
<cfset x[y] = a />
</cfloop>
This works fine, yay, I have my array now.
However, it's not what I wanted. How can I return the email address WITHOUT the angular brackets included in the first place?
(Please note I also tried REReplace and REFind only returns the index of the occurrance)
(Also note that there are no [] brackets in the string, that's just saying the value inside isn't what I posted here)
Coldfusion implements a regex flavor (Apache ORO, see this answer for details) that doesn't support lookbehind assertions which would be useful in this case.
But we can arrive at an approximation:
<cfset x = REMatch("[^<>]+(?=>)",list) />
should work as long as all angle brackets occur in unnested, balanced pairs.
Explanation:
[^<>]+ # Match one or more characters except angle brackets
(?=>) # Make sure the next character is a closing angle bracket

Getting string between two characters - Coldfusion

I'm struggling a bit with ColdFusion (not the language I ever write in).
I am trying to do a regex to get a part of a string.
So for example, if my string is: D_CECILA23_CEC23423
I want the part that is between the 2 underscores.
This is the code I have so far, and it works for anything that is alpha characters, but when a number is thrown into the mix, it just breaks.
<cfset myStr = "D_CELCI_LISA">
<cfset myStr2 = reReplace(myStr, "([\w\d\%]+)(\_)([/ A-Z]+)(\_)([\w\d\?]+)", "\3", "all") >
<cfoutput>
myStr: #myStr#<br />
myStr2: #myStr2#<br />
</cfoutput>
Which gives me:
myStr: D_CELCI_LISA
myStr2: CELCI
If it really is as simple as getting the text between the first and second underscore character, you don't need a regex. This'll do it:
myStr2 = listGetAt(myStr, 2, "_");
That said, this should do for the regex in that context: ^.*_([^_]+)_.*$, eg:
myStr2 = reReplace(myStr, "^.*_([^_]+)_.*$", "\1", "all");
#user2429578 ListLast() and ListFirst() for the last or first element of a list.

Regex to find length in description

<cfset RegexToFindLength = "Length:.*?(\d*\.?\d+)\s*(""|")"/>
<cfset Description = "blah blah blah 2.5"""/>
<cfset size = #reMatch(RegexToFindLength, Description)# />
<cfdump var="#size#">
Error Message: ColdFusion was looking at the following text:
)
looking to extract Length: 2.5" from the products description.
I have tested the above regex expression in regexpal and it works. But when i try using it in a cfm page, i get errors.
Can someone explain to me how this would be setup in CF?
You have a few issues here.
1) You don't escape your double quotes, so you end up closing your regex string and confusing it.
Personally, when I have to use double quotes in a string, I tend to use single quotes to define the string if I can.
<cfset RegexToFindLength = 'Length:.*?(\d*\.?\d+)\s*(""|")'/>
2) Your Description variable doesn't have the string you're searching for, so there will be no match. I changed this to the following to make it work (note the single quotes for defining the string):
<cfset Description = 'Length:.:2.5""'/>
3) (maybe not an issue) Size is not being set to a number. rematch returns an array of strings. You'll want to check the length of the string inside the array positions or check the length of the array itself - I don't know what exactly it is that you want to do.

regex find content question

Trying to use regex refind tag to find the content within the brackets in this example using coldfusion
joe smith <joesmith#domain.com>
The resulting text should be
joesmith#domain.com
Using this
<cfset reg = refind(
"/(?<=\<).*?(?=\>)/s","Joe <joe#domain.com>") />
Not having any luck. Any suggestions?
Maybe a syntax issue, it works in an online regex tester I use.
You can't use lookbehind with CF's regex engine (uses Apache Jakarta ORO).
However, you can use Java's regex though, which does support them, and I've created a wrapper CFC that makes this even easier. Available from:
http://www.hybridchill.com/projects/jre-utils.html
(Update: The wrapper CFC mentioned above has evolved into a full project. See cfregex.net for details.)
Also, the /.../s stuff isn't required/relevant here.
So, from your example, but with improved regex:
<cfset jrex = createObject('component','jre-utils').init()/>
<cfset reg = jrex.match( "(?<=<)[^<>]+(?=>)" , "Joe <joe#domain.com>" ) />
A quick note, since I've updated that regex a few times; hopefully it's at its best now...
(?<=<) # positive lookbehind - start matching at `<` but don't capture it.
[^<>]+ # any char except `<` or `>`, the `+` meaning one-or-more greedy.
(?=>) # positive lookahead - only succeed if there's a `>` but don't capture it.
I've never been happy with the regular expression matching functions in CF. Hence, I wrote my own:
<cfscript>
function reFindNoSuck(string pattern, string data, numeric startPos = 1){
var sucky = refindNoCase(pattern, data, startPos, true);
var i = 0;
var awesome = [];
if (not isArray(sucky.len) or arrayLen(sucky.len) eq 0){return [];} //handle no match at all
for(i=1; i<= arrayLen(sucky.len); i++){
//if there's a match with pos 0 & length 0, that means the mime type was not specified
if (sucky.len[i] gt 0 && sucky.pos[i] gt 0){
//don't include the group that matches the entire pattern
var matchBody = mid( data, sucky.pos[i], sucky.len[i]);
if (matchBody neq arguments.data){
arrayAppend( awesome, matchBody );
}
}
}
return awesome;
}
</cfscript>
Applied to your problem, here is my example:
<cfset origString = "joe smith <joesmith#domain.com>" />
<cfset regex = "<([^>]+)>" />
<cfset matches = reFindNoSuck(regex, origString) />
Dumping the "matches" variable shows that it is an array with 2 items. The first will be <joesmith#domain.com> (because it matches the entire regex) and the second will be joesmith#domain.com (because it matches the 1st group defined in the regular expression -- all subsequent groups would also be captured and included in the array).
/\<([^>]+)\>$/
something like that, didn't test it though, that one's yours ;)