ColdFusion: XmlParse does not preserve Carriage Return - coldfusion

Test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<node>line1
line2
line3</node>
CF code:
<cfset xmlfile = ExpandPath("test.xml")>
<cffile action="read" file="#xmlfile#" variable="xmlstring">
<cffile action="write" file="test1.xml" output="#xmlstring#">
<cfset xmldoc = XmlParse(xmlstring)>
<cfset xmltext = ToString(xmldoc)>
<cffile action="write" file="test2.xml" output="#xmltext#">
The input file test.xml file is in CRLF format, UTF-8 encoded, 77 bytes.
The first output file (test1.xml) is in CRLF format, ANSI encoded, 76 bytes.
The second output file (test2.xml) is in UNIX format, ANSI encoded, 71 bytes.
The contents of the XML node in the input file is line1 Chr(13)Chr(10) line2 Chr(13)Chr(10) line3 (whitespaces for readability).
The contents of the XML node in the first output file is the same as above.
The contents of the XML node in the second output file is line1 Chr(10) line2 Chr(10) line3.
Any ideas why the carriage return character Chr(13) was not preserved after the XmlParse/ToString sequence?
UPDATE:
The problem lies only with XmlParse. It's not about ToString or cffile.
Here is a more relevant example - you can test for yourselves:
<cfsavecontent variable="xmlvar">
<nodes>
<node>
line1
line2
line3
</node>
</nodes>
</cfsavecontent>
<cfset vtext = "#xmlvar#">
<cfset vtext = Replace(vtext,Chr(10),'LF','All')>
<cfset vtext = Replace(vtext,Chr(13),'CR','All')>
<cfdump var = "#vtext#">
<!--- outputs CRLF<nodes>CRLF <node>CRLFline1CRLFline2CRLFline3CRLF </node>CRLF</nodes>CRLF --->
<cfset xmldoc = XmlParse(xmlvar)>
<cfset vtext = "#xmldoc.nodes.node.XmlText#">
<cfset vtext = Replace(vtext,Chr(10),'LF','All')>
<cfset vtext = Replace(vtext,Chr(13),'CR','All')>
<cfdump var = "#vtext#">
<!--- outputs LFline1LFline2LFline3LF --->

XML Parsers normalize the CR/LF to LF per the spec. To keep the CR/LF EOL use entity references. See below:
<cfsavecontent variable="xmlvar">
<nodes>
<node>
line1
line2
line3
</node>
</nodes>
</cfsavecontent>

Have you tried to use the parameter charset='utf-8' in the cffile tag ?

I cannot reproduce anything that you are talking about with ColdFusion 9.0.1 on Mac OSX. White space is being preserved just as it goes in. I tried both of your examples above and they worked (mostly) as expected. I actually did not see any CRs in the replace()s, I only saw LFs. But it maintained them all.

Related

Coldfusion Replace() not working on all pages for MS-Word Document

I have a word document with tables laid out to look like a form. I have placeholders like %firstName%, %lastName%, %birthdate%...etc.
When I use the replace() function, the %firstName%, %lastName%, %birthdate% and all of the other placeholder fields are replaced on the first and second page. After the second, nothing replaces. All the names of the placeholders on the 3rd and 4th pages are the same as the 1st and 2nd pages. I even copied and pasted the placeholder names and I've made sure there are no added spaced. Curious to know if anyone else has had this happen and can tell me what was done to fix it.
<cfset docPath = GetDirectoryFromPath(GetCurrentTemplatePath()) & "UserTemplate.rtf" />
<cflock name="UserTemp" type="exclusive" timeout="30">
<cfset rtf = FileRead(docPath) />
<cfquery name = "qUserFormData">
SELECT * FROM vUserFormData WHERE UserID = 3
</cfquery>
<cfset rtf = Replace(rtf,"%firstName%",#firstName#)/>
<cfset rtf = Replace(rtf,"%lastName%",#lastName#) />
<cfset rtf = Replace(rtf,"%birthday%",#birthday#) />
</cflock>
<cfheader name="content-disposition" value="filename=UserTemplate.doc" />
<cfcontent type="application/msword"><cfoutput>#rtf#</cfoutput>
There is a fourth (optional) parameter to the replace() method; scope.
Scope:
one: replaces the first occurrence (default)
all: replaces all occurrences
Notice that "one" is the default and that only replaces the first occurrence. Try adding that fourth parameter like this:
<cfset rtf = Replace(rtf,"%firstName%",firstName,"all") />
<cfset rtf = Replace(rtf,"%lastName%",lastName,"all") />
<cfset rtf = Replace(rtf,"%birthday%",birthday,"all") />
(The hash tags # are not necessary in this bit of code.)
Also be aware that the replace() method you are using is case sensitive.

Strip html to end up with linebreak-delimited list

I want to create database location records in mySQL. I have the following html string from a select box:
<cfset x='
<option value="1188">Aka Aka</option><option value="346">Ararimu</option><option value="293">Awhitu</option><option value="2851">Bombay</option><option value="865">Buckland</option>
'>
Rather than manually enter the records in the database, I'd like to strip out the html tags and end up with the following:
Aka Aka
Ararimu
Awhitu
Bombay
Buckland
Then I could do a simple loop based on line breaks and enter the data programatically. I can probably handle that part, but what I need to know is the simplest way to strip out the html to end up with the line break delimited list.
Here you go:
<cfset x='
<option value="1188">Aka Aka</option><option value="346">Ararimu</option><option value="293">Awhitu</option><option value="2851">Bombay</option><option value="865">Buckland</option>
'>
<cfset y = ListToArray(x, "</option>", "false", "true") />
<cfset z = ArrayNew(1) />
<cfloop array="#y#" index="name">
<cfif Trim(ListLast(name, ">")) is not "">
<cfset temp = ArrayAppend(z, ListLast(name, ">")) />
</cfif>
</cfloop>
<cfdump var="#z#" />
you have them in a 'z' array now, you can convert to list and add line break delimiters if you really want to.

Simple regex help in coldfusion

I have a string that I wish to remove some characters based on underscores in the string. For instance.
I wish to change
2_MASTER BEDROOM_CFM
to
MASTER BEDROOM
OR
2734923ie_BEDROOM 2_CFM
to
BEDROOM 2
Any recomendations on how to do this with coldfusion?
ColdFusion has the GetToken() function, which makes manipulating a string with a delimiter (virtually any delimiter) very easy. Assuming each string you're looking to parse is 2 sets of strings then this will output MASTER BEDROOM
<cfset String1 = '2_MASTER BEDROOM_CFM'>
<cfset FirstWord = ListFirst(String1,' ')>
<cfset FirstWord = GetToken(FirstWord,2,'_')>
<cfset SecondWord = ListLast(String1,' ')>
<cfset SecondWord = GetToken(SecondWord,1,'_')>
<cfoutput>
#FirstWord# #SecondWord#
</cfoutput>
Could also simplify it down to just
<cfset String1 = '2_MASTER BEDROOM_CFM'>
<cfoutput>
#GetToken(ListFirst(String1,' '),2,'_')# #GetToken(ListLast(String1,' '),1,'_')#
</cfoutput>
EDIT As Leigh points out in the comments you could also just use
getToken("2_MASTER BEDROOM_CFM", 2, "_")
This treats your string as a list with elements 2, MASTER BEDROOM, and CFM
So the string starts with some numbers and/or characters, then an underscore. Then some text, finally an underscore followed by CFM? Here's a regex that catches that:
^[a-z0-9]+_(.*)_CFM$
And here's some code that works for me:
<cfoutput>
<cfset String1 = '2_MASTER BEDROOM_CFM'>
<cfset yourString = reReplaceNoCase(String1, "^[a-z0-9]+_(.*)_CFM$", "\1")>
#yourString#<br>
<cfset String2 = "2734923ie_BEDROOM 2_CFM">
<cfset yourString = reReplaceNoCase(String2, "^[a-z0-9]+_(.*)_CFM$", "\1")>
#yourString#<br>
</cfoutput>

Comma is invalid when using dataformat

Here is my code to output a query to a spreadsheet.
<cfscript>
//Use an absolute path for the files. --->
theDir=GetDirectoryFromPath(GetCurrentTemplatePath());
theFile=theDir & "getTestInv.xls";
//Create an empty ColdFusion spreadsheet object. --->
theSheet = SpreadsheetNew("invoicesData");
//Populate the object with a query. --->
SpreadsheetAddRows(theSheet,getTestInv);
</cfscript>
<cfset format = StructNew()>
<cfset format.dataformat = "#,###0.00">
<cfset SpreadsheetFormatColumn(theSheet,format,10)
<cfspreadsheet action="write" filename="#theFile#" name="theSheet" sheetname="getTestInv" overwrite=true>
The error I am getting is:
Invalid CFML construct found on line 125 at column 32.
ColdFusion was looking at the following text:
,
The CFML compiler was processing:
An expression beginning with /", on line 125, column 30.This message is usually caused by a problem in the expressions structure.
A cfset tag beginning on line 125, column 4.
A cfset tag beginning on line 125, column 4.
125: <cfset format.dataformat = "#,###0.00">
For some reason, it doesn't like the comma, even though it is valid according to the documentation. If I take the comma out, it works, but I need it for the thousands grouping.
Anyone encountered this?
In ColdFusion, the # is a reserved character. To escape it, you'll have to double them up to escape them:
<cfset format.dataformat = "##,######0.00">
Silly that they didn't account for this either in the documentation or followed ColdFusion's formatting rules using 9s instead of #s.
Here is my full working standalone test code:
<cfset myQuery = QueryNew('number')>
<cfset newRow = QueryAddRow(MyQuery, 2)>
<cfset temp = QuerySetCell(myQuery, "number", "349348394", 1)>
<cfset temp = QuerySetCell(myQuery, "number", "10000000", 2)>
<cfscript>
//Use an absolute path for the files. --->
theDir=GetDirectoryFromPath(GetCurrentTemplatePath());
theFile=theDir & "getTestInv.xls";
//Create an empty ColdFusion spreadsheet object. --->
theSheet = SpreadsheetNew("invoicesData");
//Populate the object with a query. --->
SpreadsheetAddRows(theSheet,myQuery,1,1);
</cfscript>
<cfset format = StructNew()>
<cfset format.dataformat = "##,######0.00">
<cfset SpreadsheetFormatColumn(theSheet,format,1)>
<cfspreadsheet action="write" filename="#theFile#" name="theSheet" sheetname="theSheet" overwrite=true>
it should be like that
<cfset format = StructNew()>
<cfset format.dataformat = "##,####0.00">

ColdFusion , REGEX - Given TEXT, find all items contained in SPANs

I'm looking to learn how to create a REGEX in Coldfusion that will scan through a large item of html text and create a list of items.
The items I want are contained between the following
<span class="findme">The Goods</span>
Thanks for any tips to get this going.
You don't say what version of CF. Since v8 you can use REMatch to get an array
results = REMatch('(?i)<span[^>]+class="findme"[^>]*>(.+?)</span>', text)
Use ArrayToList to turn that into a list.
For older version use REFindNoCase and use Mid() to extract substrings.
EDIT: To answer your follow-up comment the process of using REFind to return all matches is quite involved because the function only returns the FIRST match. This means you actually have to call REFind many times passing a new startpos each time. Ben Forta has written a UDF which does exactly this and will save you some time.
<!---
Returns all the matches of a regular expression within a string.
NOTE: Updated to allow subexpression selection (rather than whole match)
#param regex Regular expression. (Required)
#param text String to search. (Required)
#param subexnum Sub-expression to extract (Optional)
#return Returns a structure.
#author Ben Forta (ben#forta.com)
#version 1, July 15, 2005
--->
<cffunction name="reFindAll" output="true" returnType="struct">
<cfargument name="regex" type="string" required="yes">
<cfargument name="text" type="string" required="yes">
<cfargument name="subexnum" type="numeric" default="1">
<!--- Define local variables --->
<cfset var results=structNew()>
<cfset var pos=1>
<cfset var subex="">
<cfset var done=false>
<!--- Initialize results structure --->
<cfset results.len=arraynew(1)>
<cfset results.pos=arraynew(1)>
<!--- Loop through text --->
<cfloop condition="not done">
<!--- Perform search --->
<cfset subex=reFind(arguments.regex, arguments.text, pos, true)>
<!--- Anything matched? --->
<cfif subex.len[1] is 0>
<!--- Nothing found, outta here --->
<cfset done=true>
<cfelse>
<!--- Got one, add to arrays --->
<cfset arrayappend(results.len, subex.len[arguments.subexnum])>
<cfset arrayappend(results.pos, subex.pos[arguments.subexnum])>
<!--- Reposition start point --->
<cfset pos=subex.pos[1]+subex.len[1]>
</cfif>
</cfloop>
<!--- If no matches, add 0 to both arrays --->
<cfif arraylen(results.len) is 0>
<cfset arrayappend(results.len, 0)>
<cfset arrayappend(results.pos, 0)>
</cfif>
<!--- and return results --->
<cfreturn results>
</cffunction>
This gives you the start (pos) and length of each match so to get each substring use another loop
<cfset text = '<span class="findme">The Goods</span><span class="findme">More Goods</span>' />
<cfset pattern = '(?i)<span[^>]+class="findme"[^>]*>(.+?)</span>' />
<cfset results = reFindAll(pattern, text, 2) />
<cfloop index="i" from="1" to="#ArrayLen(results.pos)#">
<cfoutput>match #i#: #Mid(text, results.pos[i], results.len[i])#<br></cfoutput>
</cfloop>
EDIT: Updated reFindAll with subexnum argument. Setting this to 2 will capture the first subexpression. The default value 1 captures the entire match.
Try looking into the possibility of making your HTML work with a regular DOM Parser and querying it via XPath instead of hammering this trough an regex-based abomination.
to make HTML input usable, pass it through jTidy (see http://jtidy.riaforge.org/)
Once you have well-formed XML/XHTML, build an XML document from it
<cfset dom = XmlParse(scrubbedHtml, true)>
query the XML document using XPath
<cfset result = XmlSearch(dom, "//span[#class='findme']")>
Done.
EDIT: Coldfusion's XmlSearch() doesn't have great XML namespace support. If you end up producing XHTML instead of the more recommendable XML, use the following XPath (note the colon) "//:span[#class='findme']" or "//*:span[#class='findme']". See here and here for more info.
See the jTidy API documentation for a complete overview what jTidy can do.