How to let code users input in code tags go trough HTML Purifier or how to make shure user input is safe without HTML Purifier - xss

I have a forum on my site where users can input markdown code. First i convert the markdown into HTML, then i purify the generated HTML with HTML Purifier and insert it into my database. But when users put in code for snippets, ect.. HTML Purifier removes the code from the post..
How can I fix this? Or should I look for an alternative to HTML Purifier?

Related

Auto-Expanding TextAreas on PDF Generated from Django Template

I'm using pdfkit to generate a PDF of a Django template (doing this by getting an HTML string of the page from Django's get_template and render functions and passing that string to pdfkit... see post).
On this page, I have some TextArea's that can contain many lines of text, and by default, they just get cut off when generating the PDF.
I've tried to fix this by using some javascript libraries (I've tried several) to automatically expand the TextAreas on page load. I can get these to work perfectly on normal pages, but when I try to include it on the PDF template, I get various errors ranging from not working at all to expanding the TextArea way too much. My first assumption was that there was some styling differences that were causing the issues, but I'm fairly certain I've ruled that out. I tried to load the PDF template directly as a view, and the TextArea's resized correctly, leading me to believe that there's something with pdfkits generation that isn't playing nicely with the resizing.
Given this, I tried to look if pdfkit has any suggestions for issues like this and couldn't find any, and I also tried to use different input types other than TextAreas, none of which were able to display newlines correctly.
I can't think of any other potential solutions at this point, and I'm open to suggestions. Please let me know if you feel I should provide additional information, and thank you in advance.
I ended up finding a relatively simple fix. Because I was using django forms, I was pretty easily able to change from displaying the form Textarea:
{{ form.paragraph_data }}
to displaying just the plain text:
{{ form.paragraph_data.initial }}
However, this initially caused the newlines to not display correctly, because HTML doesn't process them in a plain string. So I added some processing in the creation of the form to replace the newlines with <br />s:
form.fields['paragraph_data'].initial = form.fields['paragraph_data'].initial.replace('\n', '<br />')
Finally, I had to add the safe filter to Django templating line to tell it to actually render the HTML rather than cleansing it:
{{ form.paragraph_data.initial|safe }}
Again, this was partially easy because of Django forms, but it should translate relatively easily to a more standard javascript/html solution.

Django text file upload and security when using 'mark_safe'

I'm working on a Django app where the user uploads a space/tab/comma delimited text file. I display the text in a browser and the user can then interactively parse columns of delimited values which get highlighted with css as they change the settings. (Only a sample is displayed not the whole file!)
To highlight the selections I insert html/css code in and around the text but have to 'mark_safe' the text to get the html/css to render. I assume this opens security issues as even I, a complete noob could insert html in my input file and get it to render.
My Question:
Is there something I can use to strip html out of the text file immediately after I've uploaded it and before I render it in the browser? Would stripping '<' and '>' out be enough? What about something to disable .js if required?
I understand there are other well documented security measures I can take regarding file uploads. However I'm after a solution to my specific issue relating to me 'marking_safe' the input text I then render to the browser.
Django already has Automatic HTML escaping for this. Take a look at the link I posted in the docs. Hope this helps.

XSS Prevention, Tidy vs Purifier?

Greetings,
I'm trying to prevent XSS and improper html from input fields using CKEditor (a javascript WYSIWYG editor).
How should I filter this data on the server side? The two options I'm comparing are PHP Tidy and HTML Purifier. I'm interested in speed, security, and valid nesting.
Edit:
According to HTML Purifier, Tidy does not prevent XSS. So, let me specify that I would first pass the user input through
strip_tags($input,'<img><a><li><ol><ul><b><br>'); before passing to Tidy
HTML Purifier restricts the input beyond what strip_tags can. strip_tags would not strip JavaScript from the attributes of the tags you are allowing. I definitely recommend using HTML Purifier. HTML Purifier is not fast, but add/edit executions are usually less frequent than views so performance is less of an issue.

Importing HTML into TinyMCE using ColdFusion

Hey everyone, I would appreciate a pointing in the right direction with the problem I'm having. In short, I'm working on an application that will create PDFs using TinyMCE and ColdFusion 8. I have the ability to create a PDF by just entering in text, pictures, etc. However, I want to be able to import an html template and insert it into the TinyMCE .
Basically, I have a file directory code snippet that lets me browse through my 'HTMLTemplates' folder, and am able to select an HTML document. Now, I want to be able to take all the code from that selected HTML document and insert it into my TinyMCE box. Any tips on how I might do this, maybe?
Thanks!
If I understood you correctly, you already have a TinyMCE plugin which pops up a window and allows you to browse the certain directory using existing cfm page which you render within the popup window. Right?
If not, you should start with this. Not sure how easy it is done in current version, but in the older TinyMCE I've created the custom upload plugin (needed to track the site security permissions for current user) pretty quickly.
Next, I can see two quick ways to pass the server file contents to the client-side:
Make it available via HTTP so you can make the GET request and read contents into the variable.
Output it on the page using CF (say, on form submit when file selected) and grab using JavaScript.
I'd personally tried the second option. After you grab the text into the variable you can put it into the TinyMCE using it's API.
It can be as simple as output escaped text into the hidden div with known ID and read it using DOM operations (assuming that there is cfoutput around):
<div id="myTemplate">#HTMLEditFormat(myFileContents)#</div>
Also you can output the text directly into the JavaScript variable (of cource, with accurate escaping), maybe like this.
<script type="text/javascript">
var text = '#HTMLEditFormat(myFileContents)#';
</script>
Most advanced and possibly better for performance (and definitely "cooler") way is to use the concept of script tags as data containers, like this:
<script type="text/plain">
#HTMLEditFormat(myFileContents)#
</script>
Last time I've seen this in Nadel's blog, I think. Read it, pretty interesting.
Hope this helps.

How do use fckEditor safely, without risk of cross site scripting?

This link describes an exploit into my app using fckEditor:
http://knitinr.blogspot.com/2008/07/script-exploit-via-fckeditor.html
How do I make my app secure while still using fckEditor? Is it an fckEditor configuration? Is it some processing I'm supposed to do server-side after I grab the text from fckEditor?
It's a puzzle because fckEditor USES html tags for its formatting, so I can't just HTML encode when I display back the text.
Sanitize html server-side, no other choice. For PHP it would be HTML Purifier, for .NET I don't know. It's tricky to sanitize HTML - it's not sufficient to strip script tags, you also have to watch out for on* event handlers and even more, thanks to stupidities of IE for example.
Also with custom html and css it's easy to hijack look and layout of your site - using overlay (absolutely positioned) which covers all screen etc. Be prepared for that.
The bug is not actually FCKeditors fault. As long as you let users edit HTML that will be displayed on your web site they will always have to possibility to do harm unless you check the data before you output it.
Some people use HTMLencoding to do this, but that will destroy all the formatting done by FCKeditor, not what you want.
Maybe you can use the Microsoft Anti-Cross Site Scripting Library. Samples on MSDN
Is it some processing I'm supposed to do server-side after I grab the text from fckEditor?
Precisely. StackOverflow had some early issues related to this as well. The easiest way to solve it is to use an HTML library to parse user's input, and then escape any tags you don't want in the output. Do this as a post-processing step when printing to the page -- the data in the database should be the exact same as what the user typed in.
For example, if the user enters <b><script>evil here</script></b>, your code would translate it to <b><script>evil here</script></b> before rendering the page.
And do not use regular expressions for solving this, that's just an invitation for somebody clever to break it again.
FCKEditor can be configured to use only a few tags. You will need to encode everything except for those few tags.
Those tags are: <strong> <em> <u> <ol> <ul> <li> <p> <blockquote> <font> <span>.
The font tag only should have face and size attributes.
The span tag should only have a class attribute.
No other attributes should be allowed for these tags.
I understand the DONTS. I'm lacking a DO.
Is use of FCKEditor a requirement, or can you use a different editor/markup language? I advise using Markdown and WMD Editor, the same language used by StackOverflow. The Markdown library for .NET should have an option to escape all HTML tags -- be sure to turn it on.
XSS is a tricky thing. I suggest some reading:
Is HTML a Humane Markup Language?
Safe HTML and XSS
Anyway, my summary is when it comes down to it, you have to only allow in strictly accepted items; you can't reject known exploit vectors because or you'll always be behind the eternal struggle.
I think the issue raised by some is not that Fckeditor only encodes a few tags. This is a naive assumption that an evil user will use the Fckeditor to write his malice. The tools that allow manual changing of input are legion.
I treat all user data as tainted; and use Markdown to convert text to HTML. It sanitizes any HTML found in the text, which reduces malice.