How to get rid of weird characters in my RSS feed? - coldfusion

I've created a utf8 encoded RSS feed which presents news data drawn from a database. I've set all aspects of my database to utf8 and also saved the text which i have put into the database as utf8 by pasting it into notepad and saving as utf8. So everything should be encoded in utf8 when the RSS feed is presented to the browser, however I am still getting the weird question mark characters for pound signs :(
Here is my RSS feed code (CFML):
<cfsilent>
<!--- Get News --->
<cfinvoke component="com.news" method="getAll" dsn="#Request.App.dsn#" returnvariable="news" />
</cfsilent>
<!--- If we have news items --->
cfif news.RecordCount GT 0>
<!--- Serve RSS content-type --->
<cfcontent type="application/rss+xml">
<!--- Output feed --->
<cfcontent reset="true"><?xml version="1.0" encoding="utf-8"?>
<cfoutput>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>News RSS Feed</title>
<link>#Application.siteRoot#</link>
<description>Welcome to the News RSS Feed</description>
<lastBuildDate>Wed, 19 Nov 2008 09:05:00 GMT</lastBuildDate>
<language>en-uk</language>
<atom:link href="#Application.siteRoot#news/rss/index.cfm" rel="self" type="application/rss+xml" />
<cfloop query="news">
<!--- Make data xml compliant --->
<cfscript>
news.headline = replace(news.headline, "<", "<", "ALL");
news.body = replace(news.body, "<", "<", "ALL");
news.date = dateformat(news.date, "ddd, dd mmm yyyy");
news.time = timeformat(news.time, "HH:mm:ss") & " GMT";
</cfscript>
<item>
<title>#news.headline#</title>
<link>#Application.siteRoot#news/index.cfm?id=#news.id#</link>
<guid>#Application.siteRoot#news/index.cfm?id=#news.id#</guid>
<pubDate>#news.date# #news.time#</pubDate>
<description>#news.body#</description>
</item>
</cfloop>
</channel>
</rss>
</cfoutput>
<cfelse>
<!--- If we have no news items, relocate to news page --->
<cflocation url="../news/index.cfm" addtoken="no">
</cfif>
Has anyone any suggestions? I've done loads of research but can't find any answers :(
Thanks in advance,
Chromis

Get rid of your escaping code and use XMLFormat instead:
<item>
<title>#XMLFormat(news.headline)#</title>
<link>#Application.siteRoot#news/index.cfm?id=#XMLFormat(news.id)#</link>
<guid>#Application.siteRoot#news/index.cfm?id=#XMLFormat(news.id)#</guid>
<pubDate>#XMLFormat(news.date)# #XMLFormat(news.time)#</pubDate>
<description>#XMLFormat(news.body)#</description>
</item>
View XMLFormat livedoc page.

This worked for me, simply combine into one cfcontent tag and append charset=utf-8.
<cfcontent type="text/xml; charset=utf-8" reset="yes" />

Your escaping function is too simple. You need to change & to & first.
If you use named entities (i.e. £) that is cause of the error.

Sanitize every input when it is entered in the database, that way should simplify the display of such data afterwards.

If you are on Adobe ColdFusion 9 or above, consider using CFFEED with the "escapeChars" attribute to create your RSS (CF8 also supports CFFEED, but not that attribute).
http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7675.html

Related

Problem with anchor links using resolveurl

I'm using <cfhttp> to pull in content from another site (coldfusion) and resolveurl="true" so all the links work. The problem I'm having is resolveurl is making the anchor links (href="#search") absolute links as well breaking them. My question is is there a way to make resolveurl="true" bypass anchor links somehow?
For starters, let's use the tutorial code from Adobe.com posted in the comments. You'll want to do something similar.
<cfhttp url="https://www.adobe.com"
method="get" result="httpResp" timeout="120">
<cfhttpparam type="header" name="Content-Type" value="application/json" />
</cfhttp>
<cfscript>
// Find all the URLs in a web page retrieved via cfhttp
// The search is case sensitive
result = REMatch("https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?", httpResp.Filecontent);
</cfscript>
<!-- Now, Loop through those URLs--->
<cfoutput>
<cfloop array="#result#" item="item" index="index">
<cfif LEFT(item, 1) is "##">
<!---Your logic if it's just an anchor--->
<cfelse>
<!---Your logic if it's a full link--->
</cfif>
<br/>
</cfloop>
</cfoutput>
If it tries to return a full URL before the anchor as you say, (I've been getting inconsistent results with resolveurl="true") hit it with this to only grab the bit you want.
<cfoutput>
<cfloop array="#result#" item="item" index="index">
#ListLast(item, "##")#
</cfloop>
</cfoutput>
What this code does is grab all the URLs, and parse them for anchors.
You'll have to decide what to do next inside your loop. Maybe preserve the values and add them to a new array, so you can save it somewhere with the links fixed?
It's impossible to assume in a situation like this.
There does not appear to be a way to prevent CF from resolving the hashes. In our usage of it the current result is actually beneficial since when we present content from another site we usually want the user to be sent there.
Here is a way to replace link href values with just anchor if one is present using regular expressions. I'm sure there are combinations of issues that could occur here if really malformed html.
<cfsavecontent variable="testcontent">
<strong>test</strong>
go to google
go to section
</cfsavecontent>
<cfset domain = replace("current.domain", ".", "\.", "all") />
<cfset match = "(href\s*=\s*(""|'))\s*(http://#domain#[^##'""]+)(##[^##'""]+)\s*(""|')" />
<cfset result = reReplaceNoCase(testcontent, match, "\1\4\6", "all") />
<cfoutput><pre>#encodeForHTML(result)#</pre></cfoutput>
Output
<strong>test</strong>
go to google
<a href="#section>go to section</a>
Another option if you are displaying the content in a normal page with js/jquery available is to run through each link on display and update it to just be the anchor. This will be less likely error with malformed html. Let me know if you have any interest in that approach.

HTTP/1.1 500 Complex object types cannot be converted to simple values

I'm following a tutorial by Ben Nadel and I am receiving the following error in browser Network/XHR.
Complex object types cannot be converted to simple values
I think the problem is in the CFLoop tag but I'm not sure exactly what I should modify to resolve the error.
<!---
Get the content as a byte array (by converting it to binary,
we can echo back the appropriate length as well as use it in
the binary response stream.
--->
<cfset binResponse = ToBinary(ToBase64( objRequest.FileContent )) />
<!--- Echo back the response code. --->
<cfheader statuscode="#Val( objRequest.StatusCode )#" statustext="#ListRest( objRequest.StatusCode, ' ' )#" />
<!--- Echo back response legnth. --->
<cfheader name="content-length" value="#ArrayLen( binResponse )#" />
<!--- Echo back all response heaers. --->
<cfloop item="strKey" collection="#objRequest.ResponseHeader#">
<!--- Check to see if this header is a simple value. --->
<cfif IsSimpleValue( objRequest.ResponseHeader[ strKey ] )>
<!--- Echo back header value. --->
<cfheader name="#strKey#" value="#objRequest.ResponseHeader[ strKey ]#" />
</cfif>
</cfloop>
<!---
Echo back content with the appropriate mime type. By using
the Variable attribute, we will make sure that the content
stream is reset and ONLY the given response will be returned.
--->
<cfcontent type="#objRequest.MimeType#" variable="#binResponse#" />
I commented out each line of code until I found out that the problem line of code was the cfheader. It turned out, I don't even need the whole cfloop block of code. The author of this tutorial may have other use for this block of code but for what I do, it works fine without it. Many thanks for those who tried to help.
<cfheader name="#strKey#" value="#objRequest.ResponseHeader[ strKey ]#" />

grabbing JSON data using coldfusion

I have a URL which when run in the browser, displays JSON data, since I am new to coldfusion, I am wondering, what would be a good way to
grab the data from the web browser? Later on I will be storing the individial JSON data into MySQL database, but I need to figure out step 1
which is grabbing the data.
Please advise.
Thanks
You'll want to do a cfhttp request to load the external content.
Then you can use deserializeJSON to convert the JSON object into the appropriate cfml struct.
See the example Adobe gives in the deserializeJSON documentation.
Here is quick example:
<!--- Set the URL address. --->
<cfset urlAddress="http://ip.jsontest.com/">
<!--- Generate http request from cf --->
<cfhttp url="#urlAddress#" method="GET" resolveurl="Yes" throwOnError="Yes"/>
<!--- handle the response from the server --->
<cfoutput>
This is just a string:<br />
#CFHTTP.FileContent#<br />
</cfoutput>
<cfset cfData=DeserializeJSON(CFHTTP.FileContent)>
This is object:<br />
<cfdump var="#cfData#">
Now you can do something like this:<br />
<cfoutput>#cfData.ip#</cfoutput>
Execute this source here http://cflive.net/

Replace single quotes with double quotes in tags only! Using ColdFusion regex

I only see PHP solutions to this problem.
Basically I need to go from:
<TEXTFORMAT LEADING='2'><P ALIGN='LEFT'><FONT FACE='Verdana' style='font-size:10' COLOR='#0B333C'>My name's Mark</FONT></P></TEXTFORMAT>
to this:
<TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Verdana" style="font-size:10" COLOR="#0B333C">My name's Mark</FONT></P></TEXTFORMAT>
Using ReReplaceNoCase but ... yup you guessed it .. I suck at regular expressions! :)
Rather than use a regex, you can do what you need in this case by letting CF do the work for you, via XML parsing libraries:
<cfsavecontent variable = "origStr">
<cfoutput>
<TEXTFORMAT LEADING='2'><P ALIGN='LEFT'><FONT FACE='Verdana' style='font-size:10' COLOR='##0B333C'>My name's Mark</FONT></P></TEXTFORMAT>
</cfoutput>
</cfsavecontent>
<cfset xmlString = ToString(xmlParse(origStr))>
<cfdump var="#xmlString#">
Which will get back:
<?xml version="1.0" encoding="UTF-8"?> <TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT COLOR="#0B333C" FACE="Verdana" style="font-size:10">My name's Mark</FONT></P></TEXTFORMAT>
If that leading <?xml...> annoys you, you can cut that part off:
<cfdump var="#Right(xmlString, Len(xmlString) - 40)#">

White Space / Coldfusion

What would be the correct way to stop the white space that ColdFusion outputs?
I know there is cfcontent and cfsetting enableCFoutputOnly. What is the correct way to do that?
In addition to <cfsilent>, <cfsetting enablecfoutputonly="yes"> and <cfprocessingdirective suppressWhiteSpace = "true"> is <cfcontent reset="true" />. You can delete whitespaces at the beginning of your document with it.
HTML5 document would then start like this:
<cfcontent type="text/html; charset=utf-8" reset="true" /><!doctype html>
XML document:
<cfcontent reset="yes" type="text/xml; charset=utf-8" /><CFOUTPUT>#VariableHoldingXmlDocAsString#</CFOUTPUT>
This way you won't get the "Content is not allowed in prolog"-error for XML docs.
If you are getting unwanted whitespaces from a function use the output-attribute to suppress any output and return your result as string - for example:
<cffunction name="getMyName" access="public" returntype="string" output="no">
<cfreturn "Seybsen" />
</cffunction>
You can modify the ColdFusion output by getting access to the ColdFusion Outpout Buffer. James Brown recently demo'd this at our user group meeting (Central Florida Web Developers User Group).
<cfscript>
out = getPageContext().getOut().getString();
newOutput = REreplace(out, 'regex', '', 'all');
</cfscript>
A great place to do this would be in Application.cfc onRequestEnd(). Your result could be a single line of HTML which is then sent to the browser. Work with your web server to GZip and you'll cut bandwidth a great deal.
In terms of tags, there is cfsilent
In the administrator there is a setting to 'Enable whitespace management'
Futher reading on cfsilent and cfcontent reset.
If neither <cfsilent> nor <cfsetting enablecfoutputonly="yes"> can satisfy you, then you are probably over-engineering this issue.
When you are asking solely out of aesthetic reasons, my recommendation is: Ignore the whitespace, it does not do any harm.
Alternatively, You can ensure your entire page is stored within a variable and all this processing is done within cfsilent tags. e.g.
<cfsilent>
<!-- some coldfusion -->
<cfsavecontent variable="pageContent">
<html>
<!-- some content -->
</html>
</cfsavecontent>
<!-- reformat pageContent if required -->
</cfsilent><cfoutput>#pageContent#</cfoutput>
Finally, you can perform any additional processing after you've generated the pagecontent but before you output it e.g. a regular expression to remove additional whitespace or some code tidying.
Here's a tip if you use CFC.
If you're not expecting your method to generate any output, use output="false" in <cffunction> and <cfcomponent> (not needed only if you're using CF9 script style). This will eliminate a lot of unwanted whitespaces.
If you have access to the server and want to implement it on every page request search for and install trimflt.jar. It's a Java servlet filter that will remove all whitespace and line breaks before sending it off. Drop the jar in the /WEB-INF/lib dir of CF and edit the web.xml file to add the filter. Its configurable as well to remove comments, exclude files or extensions, and preserve specific strings. Been running it for a few years without a problem. A set it and forget it solution.
I've found that even using every possible way to eliminate whitespace, your code may still have some unwanted spaces or line breaks. If you're still experiencing this you may need to sacrifice well formatted code for desired output.
for example, instead of:
<cfprocessingdirective suppressWhiteSpace = "true">
<cfquery ...>
...
...
...
</cfquery>
<cfoutput>
Welcome to the site #query.userName#
</cfoutput>
</cfprocessingdirective>
You may need to code:
<cfprocessingdirective suppressWhiteSpace = "true"><cfquery ...>
...
...
...
</cfquery><cfoutput>Welcome to the site #query.UserName#</cfoutput></cfprocessingdirective>
This isn't CF adding whitespace, but you adding whitespace when formatting your CF.
HTH