encodeForHtml() vs htmlEditFormat() - coldfusion

encodeForHtml() (new in CF10) vs htmlEditFormat(), how are they different?

I think it is same as encodeForHTML function in java's OWASP ESAPI. More secure to avoid XSS attack to use content in HTML.
<cfsavecontent variable="htmlcontent">
<html>
<head>
<script>function hello() {alert('hello')}</script>
</head>
<body>
Book Mark & Anchor<br/>
<div class="xyz">Div contains & here.</div>
<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&# x27&#x58&#x53&#x53&#x27&#x29>
<IMG SRC=&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041>
</body>
</html></cfsavecontent>
<cfoutput>#htmleditformat(htmlcontent)#</cfoutput>
<br />
<cfoutput>#encodeforhtml(htmlcontent)#</cfoutput>

EncodeFor* functions are based on the OWASP ESAPI libraries. The main difference is that HTMLEditFormat() merely replaces "bad" strings, like &, < and > with good strings, like &, < and > whereas EncodeForHTML() is smarter, with one advantage being it can recognize content that is already encoded and not double-encode it.
For example, if a user submitted the following content to your site:
<div>
Here is <i>test</i> html content includes<br/>
<script>alert('hello')</script>
Notice how & rendered with both functions.
</div>
Both HTMLEditFormat() and EncodeForHTML() would properly escape the '<' and '>' characters. But HTMLEditFormat() would blindly encode the & again such that your output looks like:
... how &amp; rendered ...
Where it would otherwise look like with encodeForHTML():
... how & rendered ...
HTMLEditFormat() couldn't tell that the ampersand was already encoded, so it re-encoded it again. This is a trivial example, but it demonstrates how the ESAPI libraries are smarter and, therefore, more secure.
Bottom line, there's no reason to use HTMLEditFormat() in CF10+. For maximum protection, you should replace the Format functions with the Encode functions.
The complete example above and more background are at isummation: http://www.isummation.com/blog/day-2-avoid-cross-site-scripting-xss-using-coldfusion-10-part-1/

Related

Can't read the XML node elements in ColdFusion

I'm trying to read some values from the XML file which I created, but it gives me the following error:
coldfusion.runtime.UndefinedElementException: Element MYXML.UPLOAD is undefined in XMLDOC.
Here is my code
<cffile action="read" file="#expandPath("./config.xml")#" variable="configuration" />
<cfset xmldoc = XmlParse(configuration) />
<div class="row"><cfoutput>#xmldoc.myxml.upload-file.size#</cfoutput></div>
Here is my config.xml
<myxml>
<upload-file>
<size>15</size>
<accepted-format>pdf</accepted-format>
</upload-file>
</myxml>
Can someone help me to figure out what is the error?
When I am printing the entire variable as <div class="row"><cfoutput>#xmldoc#</cfoutput></div> it is showing the values as
15 pdf
The problem is the hyphen - contained in the <upload-file> name within your XML. If you are in control of the XML contents the easiest fix will be to not use hyphens in your field names. If you cannot control the XML contents then you will need to do more to get around this issue.
Ben Nadel has a pretty good blog article in the topic - Accessing XML Nodes Having Names That Contain Dashes In ColdFusion
From that article:
To get ColdFusion to see the dash as part of the node name, we have to "escape" it, for lack of a better term. To do so, we either have to use array notation and define the node name as a quoted string; or, we have to use xmlSearch() where we can deal directly with the underlying document object model.
He goes on to give examples. As he states in that article, you can either quote the node name to access the data. Like...
<div class="row">
<cfoutput>#xmldoc.myxml["upload-file"].size#</cfoutput>
</div>
Or you can use the xmlSearch() function to parse the data for you. Note that this will return an array of the data. Like...
<cfset xmlarray = xmlSearch(xmldoc,"/myxml/upload-file/")>
<div class="row">
<cfoutput>#xmlarray[1].size#</cfoutput>
</div>
Both of these examples will output 15.
I created a gist for you to see these examples as well.

jSoup - How to get elements with background style (inline CSS)?

I'm building an app in Railo, which uses the jSoup .jar library. It all works really well in my CFML language.
Anyhow, I can grab every element with a "style" attribute doing:
<cfset variables.mySelection = variables.myDocument.select("*[style]") />
But this returns an array which contains elements that sometimes do not have a "background" or "background-image" style on them. As an example, the HTML might looks like so:
<p style="color: red;">I should not be selected</p>
<p style="background: green">I **should** be selected</p>
<p style="text-align: left;">I should not be selected</p>
<p style="background-image: url("/path/to/image.jpg");">I **should** be selected</p>
So I can get these elements above, but I don't want the 1st and 3rd in my array, as they don't have a background style...do you know how I can only grab and work with these?
Please note, I'm not after a COMPUTATED style, or anything that complicated, I'm just wondering if I can filter based on the properties of an inline CSS style. Perhaps some regex after the fact? I'm open to ideas!
I tried messing with :contains(background) as a key word, but I wasn't sure if that was the correct path?
Many thanks for your help.
Michael.
Try with:
variables.myDocument.select("*[style*='background']")
As *= is the standard selector to match a substring in the attribute content.
Elements els = doc.select(div[style*=dashed]);
Or
Elements elements = doc1.select("span[style*=font-weight:bold]");

Remove whitespace in output HTML code

Consider test.cfm file with the following content:
<html>
<body>
<cfif foo EQ bar>
<cfset test = "something" />
</cfif>
<p>Hello!</p>
</body>
</html>
When run in the browser, the source code of the output of this file will look like this:
<html>
<body>
<p>Hello!</p>
</body>
</html>
Is there any way to fix this?
Is there any way to fix this?
There's nothing to fix - the HTML is perfectly valid and functional.
If your issue is the size of request, use gzip encoding.
If your issue is reading the source for debugging/etc, use developer tools such as Firebug/etc.
However, general things you should be doing to improve maintainability (which at the same time also reduces whitespace output) are:
1) Move anything that isn't display logic out of your views.
2) Convert display logic to functions and custom tags as appropriate, which both make it easier to prevent/control output.
To prevent unwanted content being output, you can:
Wrap the entire section in cfsilent, to ensure nothing gets output.
Enable enablecfoutputonly attribute of cfsetting then only use cfoutput around things you want to be output.
Always set output=false on component and function tags.
When you want to selectively output some text, wrap non-tag non-output segments in CFML comments <!---...---> (e.g. useful for preventing newline output in custom tags)
(I never bother with cfprocessingdirective, everything mentioned above solves the issues better.)
If you have access to the CF Administrator, there is an option to suppress white space.
It is under 'Server Settings' --> 'Settings' its called 'Enable Whitespace Management'.
Try <cfprocessingdirective suppressWhiteSpace="true">
Reference: http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-76de.html

HTML: sanitize a set of tags but allow all tags in <code> blocks

I'm using Django+Markdown for processing user input. Text produced by the markdown filter need to be 'safe' and is not protected by django's auto-escape mechanism, so I have to escape user input myself. This is how I do it now:
{{ text|force_escape|markdown:"codehilite" }}
However, if text contains something that would be marked as <code> by markdown, it is escaped as well and the output would be pretty ugly(e.g., '<' is displayed as < in <code>). For example, if
text = u'''
<script>alert("I'm not working 'cause I'll be escaped")</script>
The following would be marked as a code block:
<script>alert("not xss 'cause I'm in <code>")</script>
'''
Using the filter mentioned above, the produced text is:
<p>
<script>alert("I'm not working 'cause I'll be escaped")</script>
The following would be marked as a code block:
</p>
<pre class="codehilite">
<code>
&lt;script&gt;alert(&quot;not xss &#39;cause I&#39;m in &lt;code&gt;&quot;)&lt;/script&gt;
</code>
</pre>
What I what is:
<p>
<script>alert("I'm not working 'cause I'll be escaped")</script>
The following would be marked as a code block:
</p>
<pre class="codehilite">
<code>
<script>alert("not xss 'cause I'm in <code>")</script>
</code>
</pre>
I'm thinking about using BeautifulSoup to get the <code> blocks produced by markdown and reverse-escape their content. But soup.code.text returns only the 'text', excluding the tags. so I couldn't get my hands on any of the <,>,',",&s in it..
Don't escape the input before passing it to Markdown. As you found, this breaks user input in some cases. And, it doesn't ensure security: consider, e.g., "[clickme](javascript:alert%28%22xss%22%29)".
Instead, the correct approach is to use Markdown in its safe mode. I've written elsewhere about how to do so, but the short version in Django is to use something like {{ text | markdown:"safe" }}. (Alternatively, you can apply a HTML sanitizer, like HTML Purifier, to the output of the Markdown processor.)

Could anyone tell me why / how this XSS vector works in the browser?

I have suffered a number of XSS attacks against my site. The following HTML fragment is the XSS vector that has been injected by the attacker:
<a href="mailto:">
<a href=\"http://www.google.com onmouseover=alert(/hacked/); \" target=\"_blank\">
<img src="http://www.google.com onmouseover=alert(/hacked/);" alt="" /> </a></a>
It looks like script shouldn't execute, but using IE9's development tool, I was able to see that the browser translates the HTML to the following:
<a href="mailto:"/>
<a onmouseover="alert(/hacked/);" href="\"http://www.google.com" target="\"_blank\"" \?="">
</a/>
After some testing, it turns out that the \" makes the "onmouseover" attribute "live", but i don't know why. Does anyone know why this vector succeeds?
So to summarize the comments:
Sticking a character in front of the quote, turns the quote into a part of the attribute value instead of marking the beginning and end of the value.
This works just as well:
href=a"http://www.google.com onmouseover=alert(/hacked/); \"
HTML allows quoteless attributes, so it becomes two attributes with the given values.