Should I Html Encode the Html input from user? - xss

We are developing an application which takes the user input as Html and render the same Html as output in a different page. And the input should never have any dynamic behaviour in it like script tags.
We Html Encode the value in Javascript and save the encoded value in DB. We Html Decode the saved value and render it in the new page to get the expected result(check below example).
From what I have read so far, I should Html Encode the input before rendering it as output in a different page. The problem I am facing in this is that whatever the Html added by user is displayed the same in the new page
Example:
User Input:
<div><h2>Header</h2><p>this is the body text</p></div>
Output in the new page when Html encoded and assigned it to another div:
<div><h2>Header</h2><p>this is the body text</p></div>
Expected:
Header
this is the body text
The only way I was able to achieve the expected result was when I Html decoded the saved value and assigned it to another container control.
Am I missing something, I tried all the ways I am aware of Html Encoding the user input and rendering it back is not giving me the expected result. Any idea on how to achieve this?
If there is no other solution, is there any validation framework in .net available to avoid XSS attacks. I have went through AntiXSS framework from microsoft they are more for stripping any harmfull html and encoding. They do not help in letting the user know that they should not be entering some tags.
Thanks for any help in advance.

If the user input is HTML, and you encode it before saving it, then when you display it, you should decode it.
The reason the recommendation exists to encode before displaying is if the user input is expected to be text, it is recommended to encode for general display purposes (so that an ampersand actually displays as &) and also to prevent potentially malicious input from being rendered on the page and interpreted by the browser (e.g. <script> tags).
Please be careful: If you are intending to display HTML that is provided by a user that you try to sanitize the input as much as possible -- make sure they aren't trying to do anything malicious and also to make sure they don't make a simple mistake that could wreck the entire layout of a webpage (e.g. have an opening tag without a closing tag). This type of sanitation is no simple task and one of the major factors why other flavors of markup exist in the first place (e.g. Mark Down, BBCode, etc.).

#Brian Ball has answered the question, but I feel some further explanation is warranted.
The many and varied encoding protocols are context-specific.
As I understand it, the only point of HTMLencoding (as opposed to other encoding protocols like URIencoding etc) is to allow text to be rendered by a browser 'as is' if it contains elements that otherwise would be parsed as HTML (e.g. the characters & < > / and double and single quotes). The endcoding 'hides' these characters from the browser's HTML parser.
So really, the only place HTMLencoding serves any purpose is at the point of preparing the text to be rendered by a browser. There is no purpose served by HTMLencoding user-entered text that is heading for a database. You may need to use other encodings for transmission, for ensuring appropriate handling by server-side languages, etc., but HTMLencoding has no place in these contexts.
In your situation, it is the very fact that you previously HTMLencoded the content that is preventing it from being rendered as HTML when you later retrieve it from the database. The encoding is doing exactly what it is meant to.
So the simple answer is,
a. there's no point HTMLencoding the user-entered data before saving it to your database, and
b. if you want it rendered as HTML rather than printed to the screen 'as is', do not HTMLencode it at the point of displaying it on another page.

Related

Auto-Expanding TextAreas on PDF Generated from Django Template

I'm using pdfkit to generate a PDF of a Django template (doing this by getting an HTML string of the page from Django's get_template and render functions and passing that string to pdfkit... see post).
On this page, I have some TextArea's that can contain many lines of text, and by default, they just get cut off when generating the PDF.
I've tried to fix this by using some javascript libraries (I've tried several) to automatically expand the TextAreas on page load. I can get these to work perfectly on normal pages, but when I try to include it on the PDF template, I get various errors ranging from not working at all to expanding the TextArea way too much. My first assumption was that there was some styling differences that were causing the issues, but I'm fairly certain I've ruled that out. I tried to load the PDF template directly as a view, and the TextArea's resized correctly, leading me to believe that there's something with pdfkits generation that isn't playing nicely with the resizing.
Given this, I tried to look if pdfkit has any suggestions for issues like this and couldn't find any, and I also tried to use different input types other than TextAreas, none of which were able to display newlines correctly.
I can't think of any other potential solutions at this point, and I'm open to suggestions. Please let me know if you feel I should provide additional information, and thank you in advance.
I ended up finding a relatively simple fix. Because I was using django forms, I was pretty easily able to change from displaying the form Textarea:
{{ form.paragraph_data }}
to displaying just the plain text:
{{ form.paragraph_data.initial }}
However, this initially caused the newlines to not display correctly, because HTML doesn't process them in a plain string. So I added some processing in the creation of the form to replace the newlines with <br />s:
form.fields['paragraph_data'].initial = form.fields['paragraph_data'].initial.replace('\n', '<br />')
Finally, I had to add the safe filter to Django templating line to tell it to actually render the HTML rather than cleansing it:
{{ form.paragraph_data.initial|safe }}
Again, this was partially easy because of Django forms, but it should translate relatively easily to a more standard javascript/html solution.

Does XSLFO support fixed layout?

I am using xslfo to generate PDF from my XML file, but whenever I edit something in my source documents, it will impact on pagination of output, that cause some indexing issue that runs according to there appearance.
Are there any attributes or elements to handle or to fix this behaviour?
I assume you have the following situation:
Initially, a page is almost filled with text.
The text is edited and becomes longer. Now it doesn't fit on one page any more.
You want to know if there's a way to automatically change the formatting so the text will fit on one page again.
Unfortunately you can't do this with XSL-FO alone. As far as I know, there is no way to specify "this block of text has to fit on one page, and if it doesn't fit, make the font size smaller until it fits".
You'd have to do some post-processing, along the line of 'count the pages in the PDF, if the page count is larger than X, change a variable in the FO template to make the text smaller and render again'.

How to create template for content part created in admin in an orchard site

I am new with Orchard and can not seem to get past this problem.
Goal
to be able to specify two text values on each page and show those in a styled DIV if they are not blank.
Procedure
I created a new content part in Orchard's admin named
"InnerPageTitleArea"
I added two fields to this part: IpTitle (Text Field) and IpSubtitle
(Text Field)
I added this part to the Page content type
Those textboxes show when editing a page, and I filled them in on
several pages
Those values show on pages they were entered on (all good so far)
The problem:
I want to provide a template for the InnerPageTitleArea, but all
attempts have failed.
When using shape tracing, there are no alternates referencing my
part alone
All alternates begin with "Fields" i.e.
~/Themes/MyTheme/Views/Fields.Common.Text-InnerPageTitleArea.cshtml.
If I use one of those field alternates, my template is repeated
twice - once for each field
If I use a field-specific template, i.e.
~/Themes/MyTheme/Views/Fields.Common.Text-InnerPageTitleArea-IpTitle.cshtml
I can actually get at both values, but then the other value
(IpSubtitle) still displays as plain text. I could probably remedy
that with Placement.info, but I suspect that I am just lacking some
fundamental understand of Orchard.
What to do?
Your part never renders anything because it doesn't have a driver that would create a shape. The only shapes that are getting out of that part are the shapes for each of the fields. The simplest way to get what you want is to create one alternate for each of the fields. Would that work?

Django text file upload and security when using 'mark_safe'

I'm working on a Django app where the user uploads a space/tab/comma delimited text file. I display the text in a browser and the user can then interactively parse columns of delimited values which get highlighted with css as they change the settings. (Only a sample is displayed not the whole file!)
To highlight the selections I insert html/css code in and around the text but have to 'mark_safe' the text to get the html/css to render. I assume this opens security issues as even I, a complete noob could insert html in my input file and get it to render.
My Question:
Is there something I can use to strip html out of the text file immediately after I've uploaded it and before I render it in the browser? Would stripping '<' and '>' out be enough? What about something to disable .js if required?
I understand there are other well documented security measures I can take regarding file uploads. However I'm after a solution to my specific issue relating to me 'marking_safe' the input text I then render to the browser.
Django already has Automatic HTML escaping for this. Take a look at the link I posted in the docs. Hope this helps.

How do use fckEditor safely, without risk of cross site scripting?

This link describes an exploit into my app using fckEditor:
http://knitinr.blogspot.com/2008/07/script-exploit-via-fckeditor.html
How do I make my app secure while still using fckEditor? Is it an fckEditor configuration? Is it some processing I'm supposed to do server-side after I grab the text from fckEditor?
It's a puzzle because fckEditor USES html tags for its formatting, so I can't just HTML encode when I display back the text.
Sanitize html server-side, no other choice. For PHP it would be HTML Purifier, for .NET I don't know. It's tricky to sanitize HTML - it's not sufficient to strip script tags, you also have to watch out for on* event handlers and even more, thanks to stupidities of IE for example.
Also with custom html and css it's easy to hijack look and layout of your site - using overlay (absolutely positioned) which covers all screen etc. Be prepared for that.
The bug is not actually FCKeditors fault. As long as you let users edit HTML that will be displayed on your web site they will always have to possibility to do harm unless you check the data before you output it.
Some people use HTMLencoding to do this, but that will destroy all the formatting done by FCKeditor, not what you want.
Maybe you can use the Microsoft Anti-Cross Site Scripting Library. Samples on MSDN
Is it some processing I'm supposed to do server-side after I grab the text from fckEditor?
Precisely. StackOverflow had some early issues related to this as well. The easiest way to solve it is to use an HTML library to parse user's input, and then escape any tags you don't want in the output. Do this as a post-processing step when printing to the page -- the data in the database should be the exact same as what the user typed in.
For example, if the user enters <b><script>evil here</script></b>, your code would translate it to <b><script>evil here</script></b> before rendering the page.
And do not use regular expressions for solving this, that's just an invitation for somebody clever to break it again.
FCKEditor can be configured to use only a few tags. You will need to encode everything except for those few tags.
Those tags are: <strong> <em> <u> <ol> <ul> <li> <p> <blockquote> <font> <span>.
The font tag only should have face and size attributes.
The span tag should only have a class attribute.
No other attributes should be allowed for these tags.
I understand the DONTS. I'm lacking a DO.
Is use of FCKEditor a requirement, or can you use a different editor/markup language? I advise using Markdown and WMD Editor, the same language used by StackOverflow. The Markdown library for .NET should have an option to escape all HTML tags -- be sure to turn it on.
XSS is a tricky thing. I suggest some reading:
Is HTML a Humane Markup Language?
Safe HTML and XSS
Anyway, my summary is when it comes down to it, you have to only allow in strictly accepted items; you can't reject known exploit vectors because or you'll always be behind the eternal struggle.
I think the issue raised by some is not that Fckeditor only encodes a few tags. This is a naive assumption that an evil user will use the Fckeditor to write his malice. The tools that allow manual changing of input are legion.
I treat all user data as tainted; and use Markdown to convert text to HTML. It sanitizes any HTML found in the text, which reduces malice.