how to protect from XSS in WYSIWYG - xss

i store user input from WYSIWYG to SQL database,
I need to protect from XSS attack and found solution
Since am using WYSIWYG and found this solution
My problem is which is best way to protect from XSS do i need to use HTML Purifier
or i can use this simple method;
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');

The best method depends on your use-case. If you use htmlspecialchars(), then if your user enters bold text in the WYSIWYG, it will show up on your page either as <b>bold text</b> or <strong>bold text</strong>. That's probably not what you want.
If you actually want to output the formatted text from your WYSIWYG, you need to sanitise the HTML input. HTML Purifier is one good option for that, and quite easy to set up.
In short: It depends on if you actually want to output formatted text or not. If you don't, htmlspecialchars() is easier and consumes less resources. Since you're letting users use a WYSIWYG, I assume you do, though, and in that case htmlspecialchars() will ruin what you're even trying to achieve.

Related

Should I Html Encode the Html input from user?

We are developing an application which takes the user input as Html and render the same Html as output in a different page. And the input should never have any dynamic behaviour in it like script tags.
We Html Encode the value in Javascript and save the encoded value in DB. We Html Decode the saved value and render it in the new page to get the expected result(check below example).
From what I have read so far, I should Html Encode the input before rendering it as output in a different page. The problem I am facing in this is that whatever the Html added by user is displayed the same in the new page
Example:
User Input:
<div><h2>Header</h2><p>this is the body text</p></div>
Output in the new page when Html encoded and assigned it to another div:
<div><h2>Header</h2><p>this is the body text</p></div>
Expected:
Header
this is the body text
The only way I was able to achieve the expected result was when I Html decoded the saved value and assigned it to another container control.
Am I missing something, I tried all the ways I am aware of Html Encoding the user input and rendering it back is not giving me the expected result. Any idea on how to achieve this?
If there is no other solution, is there any validation framework in .net available to avoid XSS attacks. I have went through AntiXSS framework from microsoft they are more for stripping any harmfull html and encoding. They do not help in letting the user know that they should not be entering some tags.
Thanks for any help in advance.
If the user input is HTML, and you encode it before saving it, then when you display it, you should decode it.
The reason the recommendation exists to encode before displaying is if the user input is expected to be text, it is recommended to encode for general display purposes (so that an ampersand actually displays as &) and also to prevent potentially malicious input from being rendered on the page and interpreted by the browser (e.g. <script> tags).
Please be careful: If you are intending to display HTML that is provided by a user that you try to sanitize the input as much as possible -- make sure they aren't trying to do anything malicious and also to make sure they don't make a simple mistake that could wreck the entire layout of a webpage (e.g. have an opening tag without a closing tag). This type of sanitation is no simple task and one of the major factors why other flavors of markup exist in the first place (e.g. Mark Down, BBCode, etc.).
#Brian Ball has answered the question, but I feel some further explanation is warranted.
The many and varied encoding protocols are context-specific.
As I understand it, the only point of HTMLencoding (as opposed to other encoding protocols like URIencoding etc) is to allow text to be rendered by a browser 'as is' if it contains elements that otherwise would be parsed as HTML (e.g. the characters & < > / and double and single quotes). The endcoding 'hides' these characters from the browser's HTML parser.
So really, the only place HTMLencoding serves any purpose is at the point of preparing the text to be rendered by a browser. There is no purpose served by HTMLencoding user-entered text that is heading for a database. You may need to use other encodings for transmission, for ensuring appropriate handling by server-side languages, etc., but HTMLencoding has no place in these contexts.
In your situation, it is the very fact that you previously HTMLencoded the content that is preventing it from being rendered as HTML when you later retrieve it from the database. The encoding is doing exactly what it is meant to.
So the simple answer is,
a. there's no point HTMLencoding the user-entered data before saving it to your database, and
b. if you want it rendered as HTML rather than printed to the screen 'as is', do not HTMLencode it at the point of displaying it on another page.

How to allow a text box accepting only specific HTML tags?

I am having a textbox in my MVC view, that allows user to input HTML tags, but only few tags (such as, B, I, U, and A).
For this, I have set ValidateInput attribute on my POST action to False, so it allows users to input HTML tags.
But now I want to restrict users to input other HTML tags such as (INPUT, SCRIPT, etc). I mean, anything except the ones which I want to allow.
I guess, one way is to use a regex, but I am unable to find a proper regex for this.
Any idea of how to achieve this? Any help on this much appreciated.
Thanks and Regards
That's dangerous, man. Your users could still insert undesired tags using some tricks, for example encoding data. Even if you try to think all the possible ways a user can employ to enter "dangerous" tags in your code, he'll find an additional one.
So you should try to look some kind of proven solution for your problem. Look for HTML sanitizer, for example Google ASP.NET MVC sanitize html input and you'll find several solutions. AntiXSS library could be a good solution: now it's called Microsoft Web protection Library. You can include it in your solution as a NuGet package:
Install-Package AntiXSS
I recommend you to read this article to get a deeper view of the problem and its solutions:
.NET HTML Sanitation for rich HTML Input
In this article you'll find that AniXSS and a less restrictive solution with full explanation of pros, cons, and how it all works. Don't miss the references in the comments.

Stackoverflow like tag system form for django?

What I am trying to create is a site for resources. Basically, you add resources such as books and videos via links. Now, with any resource site that caters to a variety of resources, you need to tag them in order to understand what kind of resource you are using.
For example, if you make notes on something like Chemistry or key points from a talk on lets say "Django", then these are text documents. Thus you would want them inside a TEXT TAG.
So, when you are making a form for this kind of thing, what form field would you use? For example, by knee-jerk approach is to simple make a text area field, and then separate the different tags via comma. Now, this can be prone to many problems, I'd just like to know what is the best approach to take to solving this problem? Basically, an easy way to validate the data input? Would forms.ChoiceField be the best approach to the problem or is there something else that is superior?
https://www.djangopackages.com/grids/g/tagging/ is your best bet, most specifically https://github.com/alex/django-taggit. If you want to run your own tagging system, take a look at the source code for some ideas.
EDIT: The easiest way to display this in a form would be to use a ModelMultipleChoiceField. This allows you to select multiple tags for a single resource, and handles server-side validation and conversion to the actual Tag instances. However, I think most people would agree this option looks hideous, and it is certainly not user-friendly if there is a large amount of possible tags.
If you're using jQUery, another option is to use Django_select2. This is what I have personally used in a similar situation, and it handles a large amount of possible tags very well. Django_select2 is a thin wrapper around jQuery's Select2 plugin, with a bit of added functionality (most notably the AutoView and AutoModelSelect2Field). This provides a hybrid between a text field and a select list, allowing you to search all tags and easily select multiple tags. See http://ivaynberg.github.io/select2/ for examples of what you can achieve.

ColdFusion how to Prevent XSS Attacks in a WYSIWYG

I have a WYsIWYG editor in my coldfusion app and need to prevent XSS Attacks. Is there any Coldfusion ways to strip out all script type attacks?
http://blog.pengoworks.com/index.cfm/2008/1/3/Using-AntiSamy-to-protect-your-CFM-pages-from-XSS-hacks
http://code.google.com/p/owaspantisamy/downloads/list
The main question I would ask is what is this WYSIWYG for? Many WYSIWYG's allow you to define specific tags to have stripped out of the code.
For instance you can have TinyMCE strip out the script tags with
http://wiki.moxiecode.com/index.php/TinyMCE:Configuration/invalid_elements
This unfortunately does not solve your problem since all client side data form submissions are circumventable. If you must use a WYSIWYG ,then what you really need to make sure to do is to cover all your bases on the form's validation and display. You can strip out all script tags and make sure to remove any event attributes and javascript code in links href attributes.
If it is acceptable to only allow a specific subset of tags I would suggest either using BBML, BBCode, or Markdown.
http://www.depressedpress.com/Content/Development/ColdFusion/Extensions/DP_ParseBBML/Index.cfm
http://en.wikipedia.org/wiki/BBCode
http://sebduggan.com/projects/cfxmarkdown
You can use TinyMCE as a WYSIWYG for BBCode http://tinymce.moxiecode.com/examples/example_09.php and StackOverflow uses a great markdown editor http://github.com/cky/wmd
Here is some good info if you would like to render BBCode in Coldfusion
http://www.sitepoint.com/forums/showthread.php?t=248040
Something to consider is that while stripping the tags out in the browser with TinyMCE is a good idea, it makes a fatal assumption that the user is going to be submitting content via the browser. Anything that you do in the browser needs to be duplicated on the server because attackers can bypass any validation that happens in the browser.
With that said check this article: http://www.fusionauthority.com/techniques/3908-how-to-strip-tags-in-three-easy-lessons.htm which spells this out in more detail than I could here. Basically it discusses using regex and UDFs to strip tags out easily. The last example is particularly important... check it out.
To convert these tags <> or use HTMLEditformat function.

How do use fckEditor safely, without risk of cross site scripting?

This link describes an exploit into my app using fckEditor:
http://knitinr.blogspot.com/2008/07/script-exploit-via-fckeditor.html
How do I make my app secure while still using fckEditor? Is it an fckEditor configuration? Is it some processing I'm supposed to do server-side after I grab the text from fckEditor?
It's a puzzle because fckEditor USES html tags for its formatting, so I can't just HTML encode when I display back the text.
Sanitize html server-side, no other choice. For PHP it would be HTML Purifier, for .NET I don't know. It's tricky to sanitize HTML - it's not sufficient to strip script tags, you also have to watch out for on* event handlers and even more, thanks to stupidities of IE for example.
Also with custom html and css it's easy to hijack look and layout of your site - using overlay (absolutely positioned) which covers all screen etc. Be prepared for that.
The bug is not actually FCKeditors fault. As long as you let users edit HTML that will be displayed on your web site they will always have to possibility to do harm unless you check the data before you output it.
Some people use HTMLencoding to do this, but that will destroy all the formatting done by FCKeditor, not what you want.
Maybe you can use the Microsoft Anti-Cross Site Scripting Library. Samples on MSDN
Is it some processing I'm supposed to do server-side after I grab the text from fckEditor?
Precisely. StackOverflow had some early issues related to this as well. The easiest way to solve it is to use an HTML library to parse user's input, and then escape any tags you don't want in the output. Do this as a post-processing step when printing to the page -- the data in the database should be the exact same as what the user typed in.
For example, if the user enters <b><script>evil here</script></b>, your code would translate it to <b><script>evil here</script></b> before rendering the page.
And do not use regular expressions for solving this, that's just an invitation for somebody clever to break it again.
FCKEditor can be configured to use only a few tags. You will need to encode everything except for those few tags.
Those tags are: <strong> <em> <u> <ol> <ul> <li> <p> <blockquote> <font> <span>.
The font tag only should have face and size attributes.
The span tag should only have a class attribute.
No other attributes should be allowed for these tags.
I understand the DONTS. I'm lacking a DO.
Is use of FCKEditor a requirement, or can you use a different editor/markup language? I advise using Markdown and WMD Editor, the same language used by StackOverflow. The Markdown library for .NET should have an option to escape all HTML tags -- be sure to turn it on.
XSS is a tricky thing. I suggest some reading:
Is HTML a Humane Markup Language?
Safe HTML and XSS
Anyway, my summary is when it comes down to it, you have to only allow in strictly accepted items; you can't reject known exploit vectors because or you'll always be behind the eternal struggle.
I think the issue raised by some is not that Fckeditor only encodes a few tags. This is a naive assumption that an evil user will use the Fckeditor to write his malice. The tools that allow manual changing of input are legion.
I treat all user data as tainted; and use Markdown to convert text to HTML. It sanitizes any HTML found in the text, which reduces malice.