Coldfusion: how to extract a substring using regex - regex

I have a string that has some double quoted substring in it, the " character. In between the double quotes is the data i want.
How can i write a regex to extract "the first data i want" and "the second data i want" from this:
'some string with "the first data i want" and "the second data i want"'
I tried the following code.
<cfset mydata = 'some string with "the first data i want" and "the second data i want"'/>
<cfset arrData = ListToArray(mydata ,'"') />

Presumably you could do something trivial like this:
<cfset matches = REMatch('"([^"]*)"', mydata) />
<cfdump var="#matches#" label="Example REMatch" />
Unfortunately this will also include the double quotes in the Match, and ColdFusion's Regular Expression engine is quite old and shoddy, so it doesn't have support for Lookaheads/Lookbehinds.
The double quotes can be easily replaced, but if you really want to use lookaheads and look-behinds you can resort to using Java's own pattern library.
<cfset matches = [] />
<cfset pattern = CreateObject("java","java.util.regex.Pattern").Compile('(?<=")[^"]*(?=")') />
<cfset matcher = pattern.Matcher(mydata) />
<cfloop condition="matcher.Find()">
<cfset ArrayAppend(matches, matcher.Group()) />
</cfloop>
<cfdump var="#matches#" label="Example of Java Regex" />

Related

Removing <br> with REReplace in coldfusion

I have some html line break tags in a text file that i would like to remove or replace with chr(10) using the coldfusion REReplace command. I am trying
<CFSET newtext = REreplace(text, "<BR>", chr(10), "ALL")>
but it doesn't seem to work. What am i doing wrong?
Can you do a plain <cfset newtext = replaceNoCase(text, '<br>', chr(10), 'ALL')> ? Since it doesn't look like you are looking for something that needs a complex matcher, it will probably work better for you.
I would recommend using a regex here in case there are XHTML tags like <br/> or <br />:
<cfset newtext = REReplaceNoCase(text, "<br[^>]*>", chr(10), "all") />

ColdFusion - Regex to match SRC with single quotes

I have a simple regex line to extract the src="" value from an image tag:
<cfset variables.attrSrc = REMatch("(?i)src\s*=\s*""[^""]+", variables.myImageTag) />
<!--- REMatch("(?i)src\s*=\s*""[^""]+" --->
However, while this works great, it doesn't appear to be working with src='' attrubutes that display single quotes instead of double.
Ideally, I'd like it to work with both single quotes and double.
Any thoughts?
Thanks,
Michael.
(?i)src\s*=\s*(""[^""]+""|'[^']+')

Calling a function within ReReplace function

Is there a way to write in coldfusion something like:
< cfset ReReplace(value,"&#\d+;","#decodeHtmlEntity(\1)#", "all") >
Thanks a lot
The short answer is "No".
CF doesn't handle the regular expression execution natively. It hands off to a Java library (Oro, IIRC) to handle that. This means that any CF functions you call get executed before toe regex.
There is a workaround, although it's not nearly as elegant as being able to pass functions would be. Use reFind() to discover all the instances of what you are looking for, and repolace them one-by-one. If you do the replaces last-to-first (eg if there are 3 instances, do the 3rd, then the 2nd, then the 1st) your starting point for each match will remain in the same location, so you can do an reFind all, instead of doing the reFind in the loop.
HTH.
I don't think this will work if you want to replace regular expression value as argument of decodeHTMLEntity.
Updated:
<cfset myVar = ReReplace("ABC123DEF","(\d+)",addOne('\1'), "all") >
<cffunction name="addOne" access="public" output="false" returntype="string">
<cfargument name="arg1" required="true" type="string" />
<cfreturn arg1 + 1>
</cffunction>
<cfdump var="#myvar#">
Above code written to find 123 from text and add one into it but this will not work as arg1 will have \1 which is not numeric value.
Have you tried simply using URLDecode(value)?
Or if you specifically only want to decode the numeric html codes, then
<cfset myVar = ReReplace(value,"(&##[\d+];)",urlDecode('\1'), "all") >
will do what you need.
To explain what it is doing :
I've replaced the PHP decodeHTMLEntity function with the CFML version.
If you want to use back references you need to specify the capture groups in the regex pattern.
you need to double up those #'s to escape them, otherwise CF will be looking for a close # that it will never find.

Coldfusion RegEx replace html characters like

I currently have code to remove html from a string:
<cfset arguments.textToFormat = REReplaceNoCase(arguments.textToFormat,"<[^>]*>","","ALL") />
However, this does not remove html characters like
What Regex could I use to ReReplace these characters??
Thanks
For removing and other similar strings :
&[^;]+?;
HTH

Coldfusion RegEx to replace characters

I have the following code:
<cfset arguments.textToFormat = Replace(arguments.textToFormat, Chr(10), '<br />', "ALL") />
It replaces all instances of Chr(10) with a <br /> tag.
What I'd like to do however is afterwards, if there are more than two <br /> tags, replace all the extra ones with empty string (i.e. remove them)
I could do this via code, but I'm sure a regex replace would be faster. Unfortunately I haven't a clue how to construct the regex.
Any help would be great - thanks.
There may be a more elegant regex, but this should do it:
rereplace( myText, '(<br />){2,}', '<br />', 'all' )
That should find all instances of 2 or more <br /> tags, and replace the whole set with a single tag.