Django, allowing user to use embed/object html and XSS protection - django

For the site i am building i would like the users to be able to provide embed codes for video and audio sites. i know this poses a security risk, so i wanted to find out, within Django, how best to filter the html provided so that only certain tags and certain sites are allowed.
Does anyone have any references to how i can accomplish this with Django?

You may be better off using a lightweight markup language and then converting to HTML. This prevents them from playing games to get around whatever HTML checking you do. Fully and correctly checking HTML for 'gotchas' is very difficult to do.
Doing it this way is sort of from the school of That which is not explicitly permitted is prohibited.

Related

Client-side and server-side rendering of HTML page in a Django single page app

this question could be a duplicate but I have a specific use case.
The app is a single page Django app. The user can choose from several options in a drop-down box. Depending on the choice, a few input boxes will need to be rendered. As an example, if the user chooses to order a pizza, the options in another drop-down menu should be related to the toppings whereas an order for drinks should provide options related to the type or brands. The resulting number of input boxes could be 5 with one option or 10 with a different option.
I can think of rendering the page using JS or by using Python in the back-end. Because this app will be a commercial one, I do not need SEO as users need to log into the app first. I also want to minimize the amount of code that a competitor can 're-use' from my work.
Will client-side rendering open up security problems? If not, is client-side the better way to go?
This question is more of a theoretical/opinion-based nature than technical, but let me provide some answers.
Will client-side rendering open up security problems?
Generally, web application security is a server-side concern, not client-side. You can do things like input validation on the client-side, but the minimum practice for security is to sanitize, validate, and authenticate all request data anyway, so the client-side checks are more of for convenience and improved user experience than security. I'm not saying that there are no such things as client-side security concerns, but it's something I don't think is generally a cause of worry. Client-side rendering specifically and especially doesn't sound like something to be careful about: regardless of what your client-side code does, whatever <form> and <input> markup it generates, your server-side code should always handle the submitted data as if it could be malicious.
Is client-side the better way to go?
There are so many more factors to consider in order to answer this, so it's largely a matter of opinion. But since you're asking about Django, then you might want to reduce overall development friction by maximizing Django's features and design—and Django, in my view, is largely a static markup-first framework, meaning minimal use (at first, at least) of client-side JavaScript. Django Forms and Class-Based Views (CBV), for example, work well together to allow rapid development of non-single-page applications.
Your specific use case of an initial drop-down choice determining the main form to be presented could be developed very rapidly in the traditional Django way by giving up your single-page-application requirement, and just providing some initial menu page that will lead to the different views and forms (pizza vs. drinks, etc.), the latter of which you could build rapidly with the help of CBVs. (By the way, your specific use case doesn't seem too unique, actually. It's just the fundamental issue of complexity for which we have programming concepts such as polymorphism and inheritance in object-oriented programming—hence the appropriateness of CBVs.)
I know that single-page applications are nice, and is the fashionable thing nowadays, but I think people underestimate the speed of old-fashioned HTML applications. And by speed I mean not just the user's client-side experience (HTML pages load rather rapidly with HTTP2 and CDNs and all the other modern Web infrastructure tech these days), but also development time.
Besides, you can always just add single-page-like experiences in a progressive manner. Django is particularly suited to an agile-style development strategy where you'd build initial functionality rapidly without much client-side JS, and then just add rich client-side experiences (using React or Vue or something similar) where it will add the most value for users.
I also want to minimize the amount of code that a competitor can 're-use' from my work.
I don't know the full context, but generally I wouldn't worry about this. If you won't do much client-side rendering, then there won't be much client-side code to ‘steal’. But even if you do, unless you specifically write your client-side code in a way that maximizes reusability (either for yourself or for others), I think coders tend anyway to write highly-coupled code, which is to say, your client-side code will tend to be highly dependent on your server-side code's specifics, which means poor reusability. Your competitors could copy your client-side code all they want, but the cost of making it work with their own back-end will be so high that it wouldn't be worth it, they'll just want to write their own.

WYSIWYG and XSS

I'm using TinyMCE as my online editor but I'm concerned of XSS attacks etc..
I though of replacing all < and >, but that doesn't seem to be an option with this kind of editor and I'm not sure removing script tags is enough too (what about onclick, onmouseover and other events).
What should be my approach to avoid such attacks?
You have to choose, security or convenience. The WYSIWYG editor like TinyMCE is very convenient. It allows non-experts to use a web interface to update some content with or without html tags. Its the lazy way to allow someone non-technical to update html, and it comes with all kinds of hazards.
When you give users access to TinyMCE interface to your database it is absolutely equal to giving them a database client to update data directly in your database.
ALso, note that today there is a great deal of Cross-Site-Scripting that is not malicious, that is in fact facebook, linkedin, youtube, etc integration that requires script references to third party domains etc.
So if you harden the TinyMCE tool so that XSS can not be added it will be useless to a serious web developer in many scenarios.
But if you need to make an add/edit/update/delete editor XSS proof you need to validate and sanitize all inputs and your best choice is to roll your own.
In theory you can eliminate XSS like this, but in practice its difficult. There always seems to be something that you've overlooked.
The best way I've found is to use a regular expression to only permit use of certain tags that you specify ( <strong>, <em> etc) and remove all others. You also need to look for attempts to circumvent your protection by users encoding characters.

What are my options for white-listing HTML in ColdFusion?

I want to allow my users to input HTML.
Requirements
Allow a specific set of HTML tags.
Preserve characters (do not encode ã into ã, for example)
Existing options
AntiSamy. Unfortunately AntiSamy encodes special characters and breaks requirement 2.
Native ColdFusion functions (HTMLCodeFormat() etc...) don't work as they encode HTML into entities, and thus fail requirement 1.
I found this set of functions somewhere, but I have no way of telling how secure this is: http://pastie.org/2072867
So what are my options? Are there existing libraries for this?
Portcullis works well for Cold Fusion for attack-specific issues. I've used a couple of other regex solutions I found on the web over time that have worked well, though they haven't been nearly as fleshed out. In 15 years (10 as a CMS developer) nothing I've built has been hacked....knock on wood.
When developing input fields of any type, it's good to look at the problem from different angles. You've got the UI side, which includes both usability and client-side validation. Yes, it can be bypassed, but javascript-based validation is quicker, more responsive, and rates higher on the magical UI scale than backend-interruption method or simply making things "disappear" without warning. It will speed up the back-end validation because it does the initial screening. So, it's not an "instead of" but an "in-addition to" type solution that can't be ignored.
Also on the UI front, giving your users a good quality editor also can make a huge difference in the process. My personal favorite is CKeditor simply because it's the only one that can handle Microsoft Word code on the front-side, keeping it far away from my DB. It seems silly, but Word HTML is valid, so it won't setoff any red flags....but on a moderately sized document it will quickly overload a DB field insert max, believe it or not. Not only will a good editor reduce the amount of silly HTML that comes in, but it will also just make things faster for the user....win/win.
I personally encode and decode my characters...it's always just worked well so I've never changed practice.

HTML or Alternate markup for wiki site?

In choosing an editor for my wiki-like site, I'm debating whether to allow HTML or a custom alternate markup (maybe like wikipedia/wikimedia's or BBCode).
HTML benefits:
Easy for users to deal with (copying and pasting, learning)
Somewhat future proof
Many more editing tools available, usually WYSIWYG too
Alternate markup benefits:
On the server side I don't have to worry about parsing malicious javascript or styles or HTML that I don't allow
Can be easy to learn
Can be easier to decipher if not HTML-savvy
Am I missing something, what's the best solution?
Depends on your target audience. If they're tech savvy, they probably know HTML, BBCode, etc. If they're not, they probably don't and a simplified markup might be more appropriate. Personally I like markdown for the non-tech savvy. There are editing tools available for both, also libraries available for handling each of them. So really it comes down to which do you want your users to use?
I would stick with wiki markup. You can make it easier by using a WYSIWYG editor like FCKEditor
For HTML, let moderators have control using e.g. Extension:RawMsg
Peroanlly as a user, I'm not a fan of html for things like wiki editing. Most of the time you dont need more than simple features so its too verbose and just makes life harder, and I dont really like using WYSIWYG editors either. I prefer being able to type Markdown or Textile myself directly into the editing field.
If ease of use is a concern, go with a WYSIWYG editor, and then it doesn't really matter what the underlying markup is.

Can you easily configure MediaWiki to accept full HTML/CSS or even JS content?

I'd like to create a technical wiki site and it requires the full use of HTML/CSS and maybe Javascript when editing a page. Is this something I can easily configure in MediaWiki? If not, is there any other wiki software that you'd recommend?
Thanks!
You can enable raw HTML support by setting $wgRawHtml = true; in your LocalSettings.php:
http://www.mediawiki.org/wiki/Manual:$wgRawHtml
However, as noted above this is rather insecure for a public site. (If locked down to registered usage only by known folks it's ok -- but you need to trust your users.)
There are some links on that manual page to extensions organized around letting you put specific known bits of HTML/JS in your output code as well, which may or may not fit your needs better.
Well, while MediaWiki itself does not support this, there are some extensions which allow at least HTML in a page. See for example this extension list. SecureHTML might so what you are looking for.
That said, I'd like to point out that allowing raw HTML rather defeats the purpose of a wiki:
it can and will mess up formatting and create weird problems (clashes between generated and user-provided HTML)
it makes it hard/impossible to convert the wiki to other formats (such as to print it)
it makes searching harder
it makes any kind of security impossible (think XSS)
This is doubly true for allowing Javascript.
So I'd like to ask why you need this. If you need special formatting that MediaWiki does not offer, consider using (or writing) an extension for this.
If you really need arbitrary HTML, a Wiki might not be the best tool for you. You should consider a CMS, or just put HTML files into Subversion.
So what are you trying to do?
Use nowiki tags. Docs can be found here: https://www.mediawiki.org/wiki/Help:Formatting