XSS Prevention, Tidy vs Purifier? - xss

Greetings,
I'm trying to prevent XSS and improper html from input fields using CKEditor (a javascript WYSIWYG editor).
How should I filter this data on the server side? The two options I'm comparing are PHP Tidy and HTML Purifier. I'm interested in speed, security, and valid nesting.
Edit:
According to HTML Purifier, Tidy does not prevent XSS. So, let me specify that I would first pass the user input through
strip_tags($input,'<img><a><li><ol><ul><b><br>'); before passing to Tidy

HTML Purifier restricts the input beyond what strip_tags can. strip_tags would not strip JavaScript from the attributes of the tags you are allowing. I definitely recommend using HTML Purifier. HTML Purifier is not fast, but add/edit executions are usually less frequent than views so performance is less of an issue.

Related

xss prevention with ckeditor

My situation is a little bit different, I'm using CKEditor for both editing and displaying things, and the submitted string will only be shown inside CKEditor, nowhere else.
I tried this XSS:
<IMG """><SCRIPT>alert("XSS")</SCRIPT>">
I added this to the database directly from backend, not by CKEditor since I know it doesn't matter what CKEditor does before going into the database as the attacker could always send some raw http request without dealing with CKEditor.
To my surprise CKEditor shows me this:
{cke_protected_1}">
So CKEditor is doing something to prevent XSS, and I realized that the XSS security could be achieved from client side.
My question is, how good is CKEditor doing and if it's reliable if I only use no-attribute tags plus
<a><img><table><span><pre>
(<a> and <table> could be disabled if it makes things easier)
PS: The CKEditor is using default settings.
You should protect against XSS on the server side. If you have this possibility, just strip any unsafe data before saving it.
Note that wysiwyg editors must protect somehow JavaScript code included in edited HTML, in order to not destroy edited contents, which includes e.g. hiding in Wysiwyg mode <script> tags or changing onclick event handlers into "data-" attributes.
{cke_protected_1} is a result of an attempt to hide the <script> tag by CKEditor, that did not work entirely properly because of a bit "hackish" HTML taken from XSS Cheat Sheet.
The partial built-in protection in wysiwyg editors should not be considered as a replacement for a server side protection against XSS.

Editable unstructured pages

I'm building a small site framework for a set of sites that are likely to have quite a few unstructured pages - meaning they have:
Slightly different layouts per page
Lots of one-off text
None/very little generated content from models
I would like to allow clients to edit the content of these pages through my admin UI (I'm using Django for this project), but with the requirement that they are not exposed to the page HTML and are only able to edit parts of the page that I've specified as fields; for example:
Titles
A few blocks of text content
Perhaps some blocks of predefined image locations
PDF files that need embedding
Where these fields vary significantly between pages.
The layout, and what fields these pages require would be specified by the developer, so there's no need to dynamically generate much for this.
The 'best' idea I've had so far is to serialise these blocks of content once they've been edited by the user and store them in a 'Pages' table/model in my relational database, or just throw MongoDB or similar at it.
Conceptually, how would you implement such pages? As mentioned, I'm using Django so any implementation suggestions specific to Django are welcome, but general high-level ideas would be great too.
I would implement a ContentBlock model, which has .kind (header, text, image, pdf) and a .data, which would house the content (if text) or path to an uploaded pdf/image/etc. Presumably then you'd hardcode the pages with the appropriate blocks defined - I'd just use hardcoded slugs, eg, 'home-title', 'home-intro', 'about-title', 'about-text', 'about-right-photo', etc.
I would suggest not using Django's admin interface. It's much more suited to editing homogenous, non-business-logic models. I'd just add an edit view that renders the appropriate form fields for the blocks instead - html editor, file upload, etc. It's possible to do that in the django admin, but in my experience it's not worth the trouble - plus, if you do your own edit view, you can have it use the same base templates as the rest of the site, which IMO is a better user experience.
Here are a couple of apps which do that for you:
django-generic-flatblocks
django-boxes
Along with django-frontendadmin, it's super cool.

ColdFusion how to Prevent XSS Attacks in a WYSIWYG

I have a WYsIWYG editor in my coldfusion app and need to prevent XSS Attacks. Is there any Coldfusion ways to strip out all script type attacks?
http://blog.pengoworks.com/index.cfm/2008/1/3/Using-AntiSamy-to-protect-your-CFM-pages-from-XSS-hacks
http://code.google.com/p/owaspantisamy/downloads/list
The main question I would ask is what is this WYSIWYG for? Many WYSIWYG's allow you to define specific tags to have stripped out of the code.
For instance you can have TinyMCE strip out the script tags with
http://wiki.moxiecode.com/index.php/TinyMCE:Configuration/invalid_elements
This unfortunately does not solve your problem since all client side data form submissions are circumventable. If you must use a WYSIWYG ,then what you really need to make sure to do is to cover all your bases on the form's validation and display. You can strip out all script tags and make sure to remove any event attributes and javascript code in links href attributes.
If it is acceptable to only allow a specific subset of tags I would suggest either using BBML, BBCode, or Markdown.
http://www.depressedpress.com/Content/Development/ColdFusion/Extensions/DP_ParseBBML/Index.cfm
http://en.wikipedia.org/wiki/BBCode
http://sebduggan.com/projects/cfxmarkdown
You can use TinyMCE as a WYSIWYG for BBCode http://tinymce.moxiecode.com/examples/example_09.php and StackOverflow uses a great markdown editor http://github.com/cky/wmd
Here is some good info if you would like to render BBCode in Coldfusion
http://www.sitepoint.com/forums/showthread.php?t=248040
Something to consider is that while stripping the tags out in the browser with TinyMCE is a good idea, it makes a fatal assumption that the user is going to be submitting content via the browser. Anything that you do in the browser needs to be duplicated on the server because attackers can bypass any validation that happens in the browser.
With that said check this article: http://www.fusionauthority.com/techniques/3908-how-to-strip-tags-in-three-easy-lessons.htm which spells this out in more detail than I could here. Basically it discusses using regex and UDFs to strip tags out easily. The last example is particularly important... check it out.
To convert these tags <> or use HTMLEditformat function.

Using valid HTML 4.01 Strict with Django

I've found a similar question here, but I'm looking for more general solutions.
As it is now, when Django generates anykind of HTML for you (this mainly happens when generating forms), it uses self-closing tags by default i.e. <br /> instead of <br>. <br /> is valid XHTML and I think HTML5 also, but it's not valid HTML4.
Is there any clean way to override this? Or is it better to write django sites in XHTML or HTML5 instead?
There was a whole series of discussions on this when development for 1.2 kicked off, with a range of solutions proposed, but no general way forward was agreed.
But see Simon Willison's Django-HTML project for one possible solution.
You can rewrite entirely the way django output HTML for you. E.G : for the form, you can :
choose between output using table, p or li by using the property "as_xxx".
print the form label by label, choosing the tag wrappers.
use widget to define how a form piece will print to HTML.
Of course you need the new forms to do so, and there for use Django 1.X.

How do use fckEditor safely, without risk of cross site scripting?

This link describes an exploit into my app using fckEditor:
http://knitinr.blogspot.com/2008/07/script-exploit-via-fckeditor.html
How do I make my app secure while still using fckEditor? Is it an fckEditor configuration? Is it some processing I'm supposed to do server-side after I grab the text from fckEditor?
It's a puzzle because fckEditor USES html tags for its formatting, so I can't just HTML encode when I display back the text.
Sanitize html server-side, no other choice. For PHP it would be HTML Purifier, for .NET I don't know. It's tricky to sanitize HTML - it's not sufficient to strip script tags, you also have to watch out for on* event handlers and even more, thanks to stupidities of IE for example.
Also with custom html and css it's easy to hijack look and layout of your site - using overlay (absolutely positioned) which covers all screen etc. Be prepared for that.
The bug is not actually FCKeditors fault. As long as you let users edit HTML that will be displayed on your web site they will always have to possibility to do harm unless you check the data before you output it.
Some people use HTMLencoding to do this, but that will destroy all the formatting done by FCKeditor, not what you want.
Maybe you can use the Microsoft Anti-Cross Site Scripting Library. Samples on MSDN
Is it some processing I'm supposed to do server-side after I grab the text from fckEditor?
Precisely. StackOverflow had some early issues related to this as well. The easiest way to solve it is to use an HTML library to parse user's input, and then escape any tags you don't want in the output. Do this as a post-processing step when printing to the page -- the data in the database should be the exact same as what the user typed in.
For example, if the user enters <b><script>evil here</script></b>, your code would translate it to <b><script>evil here</script></b> before rendering the page.
And do not use regular expressions for solving this, that's just an invitation for somebody clever to break it again.
FCKEditor can be configured to use only a few tags. You will need to encode everything except for those few tags.
Those tags are: <strong> <em> <u> <ol> <ul> <li> <p> <blockquote> <font> <span>.
The font tag only should have face and size attributes.
The span tag should only have a class attribute.
No other attributes should be allowed for these tags.
I understand the DONTS. I'm lacking a DO.
Is use of FCKEditor a requirement, or can you use a different editor/markup language? I advise using Markdown and WMD Editor, the same language used by StackOverflow. The Markdown library for .NET should have an option to escape all HTML tags -- be sure to turn it on.
XSS is a tricky thing. I suggest some reading:
Is HTML a Humane Markup Language?
Safe HTML and XSS
Anyway, my summary is when it comes down to it, you have to only allow in strictly accepted items; you can't reject known exploit vectors because or you'll always be behind the eternal struggle.
I think the issue raised by some is not that Fckeditor only encodes a few tags. This is a naive assumption that an evil user will use the Fckeditor to write his malice. The tools that allow manual changing of input are legion.
I treat all user data as tainted; and use Markdown to convert text to HTML. It sanitizes any HTML found in the text, which reduces malice.