I'm looking to learn how to create a REGEX in Coldfusion that will scan through a large item of html text and create a list of items.
The items I want are contained between the following
<span class="findme">The Goods</span>
Thanks for any tips to get this going.
You don't say what version of CF. Since v8 you can use REMatch to get an array
results = REMatch('(?i)<span[^>]+class="findme"[^>]*>(.+?)</span>', text)
Use ArrayToList to turn that into a list.
For older version use REFindNoCase and use Mid() to extract substrings.
EDIT: To answer your follow-up comment the process of using REFind to return all matches is quite involved because the function only returns the FIRST match. This means you actually have to call REFind many times passing a new startpos each time. Ben Forta has written a UDF which does exactly this and will save you some time.
<!---
Returns all the matches of a regular expression within a string.
NOTE: Updated to allow subexpression selection (rather than whole match)
#param regex Regular expression. (Required)
#param text String to search. (Required)
#param subexnum Sub-expression to extract (Optional)
#return Returns a structure.
#author Ben Forta (ben#forta.com)
#version 1, July 15, 2005
--->
<cffunction name="reFindAll" output="true" returnType="struct">
<cfargument name="regex" type="string" required="yes">
<cfargument name="text" type="string" required="yes">
<cfargument name="subexnum" type="numeric" default="1">
<!--- Define local variables --->
<cfset var results=structNew()>
<cfset var pos=1>
<cfset var subex="">
<cfset var done=false>
<!--- Initialize results structure --->
<cfset results.len=arraynew(1)>
<cfset results.pos=arraynew(1)>
<!--- Loop through text --->
<cfloop condition="not done">
<!--- Perform search --->
<cfset subex=reFind(arguments.regex, arguments.text, pos, true)>
<!--- Anything matched? --->
<cfif subex.len[1] is 0>
<!--- Nothing found, outta here --->
<cfset done=true>
<cfelse>
<!--- Got one, add to arrays --->
<cfset arrayappend(results.len, subex.len[arguments.subexnum])>
<cfset arrayappend(results.pos, subex.pos[arguments.subexnum])>
<!--- Reposition start point --->
<cfset pos=subex.pos[1]+subex.len[1]>
</cfif>
</cfloop>
<!--- If no matches, add 0 to both arrays --->
<cfif arraylen(results.len) is 0>
<cfset arrayappend(results.len, 0)>
<cfset arrayappend(results.pos, 0)>
</cfif>
<!--- and return results --->
<cfreturn results>
</cffunction>
This gives you the start (pos) and length of each match so to get each substring use another loop
<cfset text = '<span class="findme">The Goods</span><span class="findme">More Goods</span>' />
<cfset pattern = '(?i)<span[^>]+class="findme"[^>]*>(.+?)</span>' />
<cfset results = reFindAll(pattern, text, 2) />
<cfloop index="i" from="1" to="#ArrayLen(results.pos)#">
<cfoutput>match #i#: #Mid(text, results.pos[i], results.len[i])#<br></cfoutput>
</cfloop>
EDIT: Updated reFindAll with subexnum argument. Setting this to 2 will capture the first subexpression. The default value 1 captures the entire match.
Try looking into the possibility of making your HTML work with a regular DOM Parser and querying it via XPath instead of hammering this trough an regex-based abomination.
to make HTML input usable, pass it through jTidy (see http://jtidy.riaforge.org/)
Once you have well-formed XML/XHTML, build an XML document from it
<cfset dom = XmlParse(scrubbedHtml, true)>
query the XML document using XPath
<cfset result = XmlSearch(dom, "//span[#class='findme']")>
Done.
EDIT: Coldfusion's XmlSearch() doesn't have great XML namespace support. If you end up producing XHTML instead of the more recommendable XML, use the following XPath (note the colon) "//:span[#class='findme']" or "//*:span[#class='findme']". See here and here for more info.
See the jTidy API documentation for a complete overview what jTidy can do.
Related
I have a set of list values in ColdFusion variable, and I need to replace all the list values into desired text.
For Example:
<cfset headerColumnList = "FirstName,LastName,Email,FrequentGuestID,IP Address,Time Stamp Email Marketing">
<cfset a="test1">
<cfset b="test2">
<cfset c="test3">
<cfset d="test4">
<cfset e="test5">
<cfset f="test6">
<cfloop index = "ListElement" list= "#headerColumnList#" delimiters = ",">
<cfoutput>
#replaceList("#ListElement#","FirstName,LastName,Email,FrequentGuestID,IP Address,Time Stamp Email Marketing","#a#,#b#,#c#,#d#,#e#,#f#",",")#
</cfoutput>
</cfloop>
Output:
test1
test2
test3
test4
test5
Time Stamp test3 Marketing
In the above scenario. The value "Time Stamp Email Marketing" is supposed to be replaced with "test6" but I am getting in an alternative way where it is not replacing the phrase as a whole word. Can anyone tell me how do I replace the list phrases, any alternative for this?
Here you can use the ListQualify function to get exact result of an your scenario. So convert it in to qualify values and looping with that then you can replace it with your own list data. No need to change any order of a list values.
<cfset quoted = listQualify(headerColumnList,"''")>
<cfloop index = "ListElement" list= "#quoted#" delimiters = ",">
#replaceList(ListElement,quoted,"#a#,#b#,#c#,#d#,#e#,#f#")#
<br/>
</cfloop>
The code is working as written. You are seeing this because your check for "Email" in the replaceList() function is firing before the check for "Time Stamp Email Marketing". Notice the word "Email" in that string.
I don't know what your actual use case is but you can change the order of your code for this specific example to make it work like you want.
<cfset headerColumnList = "FirstName,LastName,Email,FrequentGuestID,IP Address,Time Stamp Email Marketing">
<cfset a="test1">
<cfset b="test2">
<cfset c="test3">
<cfset d="test4">
<cfset e="test5">
<cfset f="test6">
<cfloop index = "ListElement" list= "#headerColumnList#" delimiters = ",">
<cfoutput>
#replaceList("#ListElement#","FirstName,LastName,FrequentGuestID,IP Address,Time Stamp Email Marketing,Email","#a#,#b#,#d#,#e#,#f#,#c#",",")#
</cfoutput>
</cfloop>
This gives the desired output. Notice how I reordered the conditions within the replaceList() function.
I have a word document with tables laid out to look like a form. I have placeholders like %firstName%, %lastName%, %birthdate%...etc.
When I use the replace() function, the %firstName%, %lastName%, %birthdate% and all of the other placeholder fields are replaced on the first and second page. After the second, nothing replaces. All the names of the placeholders on the 3rd and 4th pages are the same as the 1st and 2nd pages. I even copied and pasted the placeholder names and I've made sure there are no added spaced. Curious to know if anyone else has had this happen and can tell me what was done to fix it.
<cfset docPath = GetDirectoryFromPath(GetCurrentTemplatePath()) & "UserTemplate.rtf" />
<cflock name="UserTemp" type="exclusive" timeout="30">
<cfset rtf = FileRead(docPath) />
<cfquery name = "qUserFormData">
SELECT * FROM vUserFormData WHERE UserID = 3
</cfquery>
<cfset rtf = Replace(rtf,"%firstName%",#firstName#)/>
<cfset rtf = Replace(rtf,"%lastName%",#lastName#) />
<cfset rtf = Replace(rtf,"%birthday%",#birthday#) />
</cflock>
<cfheader name="content-disposition" value="filename=UserTemplate.doc" />
<cfcontent type="application/msword"><cfoutput>#rtf#</cfoutput>
There is a fourth (optional) parameter to the replace() method; scope.
Scope:
one: replaces the first occurrence (default)
all: replaces all occurrences
Notice that "one" is the default and that only replaces the first occurrence. Try adding that fourth parameter like this:
<cfset rtf = Replace(rtf,"%firstName%",firstName,"all") />
<cfset rtf = Replace(rtf,"%lastName%",lastName,"all") />
<cfset rtf = Replace(rtf,"%birthday%",birthday,"all") />
(The hash tags # are not necessary in this bit of code.)
Also be aware that the replace() method you are using is case sensitive.
I am trying to check to see if data exist in my form If data does not exist I want to assign it to O. How can I do this.
<cfif not isDefined("FORM.Age")>
cfset FORM.Age = "0"
<cfif>
Generally the best practice is considered to be to avoid isDefined. This is because isDefined will search all scopes until it finds a matching variable. So it's more efficient to use structKeyExists, eg:
<cfif NOT structKeyExists(form, "age")>
<cfset form.age = 0>
</cfif>
Also, another way to achieve this is to use cfparam, and specify 0 as the default:
<cfparam name="form.age" default="0">
You're almost there:
<cfif not isDefined("FORM.Age")>
<cfset Form.Age = 0>
</cfif>
Technically what you have is fine once you enclose the cfset in tags < and >. Assuming that omission is just a typo, could it be you are trying to use it with a text field?
Text fields always exist on submission. The value may be an empty string, but the field itself still exists, so IsDefined will always return true. If that is the case, you need to examine the field length or value instead. Then do something if it is empty according to your criteria. For example:
<!--- value is an empty string --->
<cfif NOT len(FORM.age)>
do something
</cfif>
... OR
<!--- value is an empty string or white space only --->
<cfif NOT len(trim(FORM.age))>
do something
</cfif>
... OR
<!--- convert non-numeric values to zero (0) --->
<cfset FORM.Age = val(FORM.Age)>
There are actually two things you want to to ensure. First, make sure this page was arrived at by submitting the proper form. Next, ensure you have a numeric value for the form.age variable. Here is an example of how you might want to code this:
<cfif StructKeyExists(form, "age") and cgi.http_referrer is what it should be>
<cfif IsNumeric(form.age) and form.age gt 0>
<cfset AgeSubmitted = int(form.age)>
<cfelse>
<cfset AgeSubmitted = 0>
</cfif>
...more code to process form
<cfelse>
...code for when page was not arrived at properly
</cfif>
I'm passing a structure to a function for req. fields validation but I first check whether or not my structure is empty.
If all elements in my structure is empty (emptry string), I don't pass this structure to for validation.
I used StructIsEmpty to check my structure. The problem is, when my Structure's elements contain only empty string, StructIsEmpty return NO. Unfortunately I'm still on CF8.
How can I have StructIsEmpty to return YES when all of the structure elements only has empty string?
<cfset st_MyStruct=StructNew()>
<cfset st_MyStruct["InstType"]="#Trim(arr[112])#">
<cfset st_MyStruct["InstId"]="#Trim(arr[113])#">
<cfset st_MyStruct["PLN"]="#Trim(arr[115])#">
<cfset st_MyStruct["PFN"]="#Trim(arr[116])#">
<cfset st_MyStruct["Referal"]="#Trim(arr[118])#">
cfif StructIsEmpty(st_MyStruct) NEQ "NO">
<CFINVOKE component="cfcomponents.ValidateFields" method="CheckReqFields"
st_MyStruct="#st_MyStruct#"
Inst="#arguments.Inst#" >
</cfif>
Like Dan said, this struct is not empty. If you want to check if your struct has values that are blank you can do something like this. And check if your structFieldsAreEmpty variables true, if it does then your structure has all blank values. If your struct returns more than one entry you would need to modify this code
<cfset st_MyStruct = {}>
<cfset st_MyStruct["InstType"] = ''>
<cfset st_MyStruct["InstId"] = ''>
<cfset st_MyStruct["PLN"] = ''>
<cfset st_MyStruct["PFN"] = ''>
<cfset st_MyStruct["Referal"] = ''>
<cfset structFieldsAreEmpty = checkStructValuesEmpty(st_MyStruct) />
<cffunction name="checkStructValuesEmpty" access="private" returntype="boolean" output="false">
<cfargument name="myStruct" type="struct" required="true">
<cfloop collection="#arguments.myStruct#" index="i">
<cfif len(trim(arguments.myStruct[i]))>
<cfreturn false>
</cfif>
</cfloop>
<cfreturn true>
</cffunction>
If you want to do this in a single line, you could serialize the struct to JSON and search that using regex for any non-empty string values, like so:
structIsEmptyStrings = refind(':("[^"]+"|\d+|true|false)', serializeJSON(st_MyStruct)) == 0;
That regex is looking for any values that are either not empty strings (""), numeric, or a boolean value (true or false). Keep in mind that this will not be accurate if any values in the struct are any types other than string, numeric, or boolean (nested arrays or structs will not be checked). Also, if any string values are only spaces, this will consider the struct to be not empty (which might not be what you're looking for).
I've just encountered CF's unwanted "feature" which involves stripping leading zeroes from the values returned to an autosuggest input. I was thinking I could prepend some character to the values and strip them out after the return, but have hit a snag. I'm modifying an existing function, which looks like this:
<cffunction name="lookupTailNumber" access="remote" returntype="Array" >
<cfargument name="search" type="any" required="false" default="">
<!--- Define variables --->
<cfset var data="">
<cfset var result=ArrayNew(1)>
<!--- Do search --->
<cfquery name="data">
SELECT DISTINCT SERIAL_NUMBER AS list
FROM aircraft_status
WHERE SERIAL_NUMBER LIKE '%#trim(ARGUMENTS.search)#%'
ORDER BY list
</cfquery>
<!--- Build result array --->
<cfloop query="data">
<cfset ArrayAppend(result, list)>
</cfloop>
<!--- And return it --->
<cfreturn result>
</cffunction>
which returns a response which looks like this:
[3001.0,1.00002E8,1.00002001E8,1.00002002E8,1.00002003E8,1.00002004E8]
or in JSON format:
0
3001
1
100002000
2
100002001
3
100002002
4
100002003
where all the results have had leading zeroes stripped away. I've tried modifying the query to prepend a character to each value:
<cfquery name="data">
SELECT DISTINCT (concat(' ', SERIAL_NUMBER)) AS list
FROM aircraft_status
WHERE SERIAL_NUMBER LIKE '%#trim(ARGUMENTS.search)#%'
ORDER BY list
</cfquery>
which returns this:
[" 0000003001"," 0100002000"," 0100002001"," 0100002002"," 0100002003"," 0100002004"]
so you'd think all was well, right? Problem: when returned, none of the values show up in the autosuggest field!!! I've also tried prepending different characters, including numbers, with no luck. Looking at the elements in yui-ac-bd div > ul, none are populated or displayed.
The input is declared like so:
<cfinput style = "width:300px;"
class = ""
type="text"
name="txtvalueFilter"
maxlength="15"
id="txtvalueFilter"
autosuggest="cfc:mycfcpath({cfautosuggestvalue})"
/>
Thoughts?
Try appending a space, so the built-in JSON serializer will treat it as a string instead of an int in JSON.
Also, make sure you have installed the latest hotfixes for your version of CF.
I wonder if u need to "Build result array". What happen if you return data.list? or, maybe use ListToArray(valueList(data.list)) instead?