HTML Purifier - Change default allowed HTML tags configuration

HTML Purifier - Change default allowed HTML tags configuration - xss

I want to allow a limited white list of HTML tags that users can use in my forum. So I have configured the HTML Purifier like so:
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'p,a[href|rel|target|title],img[src],span[style],strong,em,ul,ol,li');
$purifier = new HTMLPurifier($config);
What I am wondering is, does the default configuration of the HTML Purifier still apply, with the exception of a reduced number of accepted HTML tags or do I need to re-set every possible configuration parameter manually?
Additionally, should I tweak the default configuration in any way to stay safe? I am new to the whole XSS protection thing, new to HTML Purifier and didn't find that the manual gave a lot of 'basic' tips and hints.

HTML Purifier is safe by default and any restrictions you impose on it by changing %HTML.Allowed are guaranteed only to reduce the permitted tag set. Check out http://htmlpurifier.org/live/smoketests/printDefinition.php to see how tweaking configuration changes the allowed tagset.

Why not just use a DOM parser and check if tag type is in allowed white list of HTML tags?
Converting the input to a DOM node list you should be able to loop through all the DOM nodes and check if the type is allowed that way. php.net has great examples for how to do this written by others like you trying to solve the input sanitization problem.
More information here:
http://php.net/manual/en/class.domdocument.php

Related

Why is Visible controlled by

In my home page I have the following snippet that fetches all blog posts:
var docs = CurrentPage.Children.Where("Visible")
What I don't understand is that Visible is controlled by a property in the document named umbracoNaviHide. Setting it to true on the document excludes the page from the list above.
How is umbracoNaviHide translated to Visible? I have no macros or XSLT (none actually) that is doing anything funny...

umbracoNaviHide is one of umbraco's internal property implementations.
We used to have to check the property explicitly in xslt but nowadays it is used as you are using it here.
Here is a more complete explanation from the Umbraco wiki
The "umbracoNaviHide" is an Umbraco convention for marking nodes which
should not show up in a navigational context. It is normally added (or
inherited) on every Document Type with a Data Type of "True/false".
NOTE: This property is not added by default on new installations,
meaning you need to add it manually
There are a number of other useful properties that everybody should know about:
umbracoSitemapHide
umbracoUrlAlias
umbracoUrlName
umbracoInternalRedirectId
umbracoRedirect
We always insert these properties on a master page doctype so that all other doc types that represent data on web page content nodes inherit them
Wing

django - ckeditor bug renders text/string in raw html format

am using django ckeditor. Any text/content entered into its editor renders raw html output on the webpage.
for ex: this is rendered output of ckeditor field (RichTextField) on a webpage;
<p><span style="color:rgb(0, 0, 0)">this is a test file ’s forces durin</span><span style="color:rgb(0, 0, 0)">galla’s good test is one that fails Thereafter, never to fail in real environment. </span></p>
I have been looking for a solution for a long time now but unable to find one :( There are some questions which are similar but none of those have been able to help. It will be helpful if any changes suggested are provided with the exact location where it needs to be changed. Needless to say I am a newbie.
Thanks

You need to mark the relevant variable that contains the html snippet in your template as safe
Obviously you should be sure, that the text comes from trusted users and is safe, because with the safe filter you are disabling a security feature (autoescaping) that Django applies per default.
If your ckeditor is part of a comment form and your mark the entered text as safe, anybody with access to the form could inject Javascipt and other (potentially nasty) stuff in your page.
The whole story is explained pretty well in the official docs: https://docs.djangoproject.com/en/dev/topics/templates/#automatic-html-escaping

C++, web browser control: cannot change encoding/charset

There's a document I'm displaying in a web browser ActiveX control hosted in a C++ app. This document has a META tag that specifies incorrect charset, so the output is funny. I know the correct encoding and want to change it programmatically to fix that. But whatever I try, the encoding remains unchanged.
I alredy tried, in various combinations and flavors:
IHTMLDocument2::put_Charset (after the document finished loading);
changing the "charset" property of the "META" tag (using IHTMLMetaElement);
deleting the "META" tag altogether (by setting its "outerHTML" to empty string);
refreshing the control.
The control demonstrates remarkable persistence in preserving the incorrect encoding. What are my other options? I can't manipulate the source of the document being loaded.

try to put the designMode property "On".

According to this, it should work if you call IWebBrowser->Refresh() after calling IHTMLDocument2->put_charset().

Here's what eventually worked:
In the handler of the "NavigateComplete2" browser event,
the charset is modified using the charset property,
then the META tag is thrown away by setting its outerHTML to empty string,
and then the control is refreshed.
Modifying the order of these actions, or omitting a step, will render the entire operation void. MSHTML is picky.

How can I sanitize user input but keep the content of <pre> tags?

I'm using CKEditor in Markdown format to submit user created content. I would like to sanitize this content from malicious tags, but I would like to keep the formatting that is the result of the markdown parser. I've used two methods that do not work.
Method one
<!--- Sanitize post content --->
<cfset this.text = HTMLEditFormat(this.text)>
<!--- Apply mark down parser --->
<cfx_markdown textIn="#this.text#" variable="parsedNewBody">
Problem For some reason <pre> and <blockquote> are being escaped, and thus I'm unable to use them. Only special characters appear. Other markdown tagging works well, such as bold, italic, etc. Could it be CKEdit does not apply markdown correctly to <pre> and <blockquote>?
Example: If I were to type <pre><script>alert("!");</script></pre> I would get the following: <script>alert("!");</script>
Method two
Same as method one, but reverse the order where the sanitation takes place after the markdown parser has done it's work. This is effectively useless since the sanitation function will escape all the tags, malicious ones or ones created by the markdown parser.
While I want to sanitize malicious content, I do want to keep basic HTML tags and contents of <pre> and <blockquote> tags!--any ideas how?
Thanks!

There are two important sanitizations that need to be done on user generated content. First, you want to protect your database from SQL injection. You can do this by using stored procedures or the <cfqueryparam> tag, without modifying the data.
The other thing you want to do is protect your site from XSS and other content-display based attacks. The way you do this is by sanitizing the content on display. It would be fine, technically, to do it before saving, but generally the best practice is to store the highest fidelity data possible and only modify it for display. Either way, I think your problem is that you're doing this sanitization out of order. You should run the Markdown formatter on the content first, THEN run it through HTMLEditFormat().
It's also important to note that HTMLEditFormat will not protect you from all attacks, but it's a good start. You'll want to look into implementing OWASP utilities, which is not difficult in ColdFusion, as you can directly use the provided Java implementation.

Why don't you just prepend and append pre tag after parsing?
I mean, if you only care about first an dlast pre and you dont have nested pre's or similar. If you cfx tag clears pre, make new wrapper method which is going to check if <pre> exists and if not, add it. Also if you use pre tags I guess new line chars are important, so check what your cfx does with those.

Maybe HTMLEditFormat twin HTMLCodeFormat is what you need?

Preventing XSS in Node.js / server side javascript

Any idea how one would go about preventing XSS attacks on a node.js app? Any libs out there that handle removing javascript in hrefs, onclick attributes,etc. from POSTed data?
I don't want to have to write a regex for all that :)
Any suggestions?

I've created a module that bundles the Caja HTML Sanitizer
npm install sanitizer
http://github.com/theSmaw/Caja-HTML-Sanitizer
https://www.npmjs.com/package/sanitizer
Any feedback appreciated.

One of the answers to Sanitize/Rewrite HTML on the Client Side suggests borrowing the whitelist-based HTML sanitizer in JS from Google Caja which, as far as I can tell from a quick scroll-through, implements an HTML SAX parser without relying on the browser's DOM.
Update: Also, keep in mind that the Caja sanitizer has apparently been given a full, professional security review while regexes are known for being very easy to typo in security-compromising ways.
Update 2017-09-24: There is also now DOMPurify. I haven't used it yet, but it looks like it meets or exceeds every point I look for:
Relies on functionality provided by the runtime environment wherever possible. (Important both for performance and to maximize security by relying on well-tested, mature implementations as much as possible.)
Relies on either a browser's DOM or jsdom for Node.JS.
Default configuration designed to strip as little as possible while still guaranteeing removal of javascript.
Supports HTML, MathML, and SVG
Falls back to Microsoft's proprietary, un-configurable toStaticHTML under IE8 and IE9.
Highly configurable, making it suitable for enforcing limitations on an input which can contain arbitrary HTML, such as a WYSIWYG or Markdown comment field. (In fact, it's the top of the pile here)
Supports the usual tag/attribute whitelisting/blacklisting and URL regex whitelisting
Has special options to sanitize further for certain common types of HTML template metacharacters.
They're serious about compatibility and reliability
Automated tests running on 16 different browsers as well as three diffferent major versions of Node.JS.
To ensure developers and CI hosts are all on the same page, lock files are published.

All usual techniques apply to node.js output as well, which means:
Blacklists will not work.
You're not supposed to filter input in order to protect HTML output. It will not work or will work by needlessly malforming the data.
You're supposed to HTML-escape text in HTML output.
I'm not sure if node.js comes with some built-in for this, but something like that should do the job:
function htmlEscape(text) {
return text.replace(/&/g, '&').
replace(/</g, '<'). // it's not neccessary to escape >
replace(/"/g, '"').
replace(/'/g, ''');
}

I recently discovered node-validator by chriso.
Example
get('/', function (req, res) {
//Sanitize user input
req.sanitize('textarea').xss(); // No longer supported
req.sanitize('foo').toBoolean();
});
XSS Function Deprecation
The XSS function is no longer available in this library.
https://github.com/chriso/validator.js#deprecations

You can also look at ESAPI. There is a javascript version of the library. It's pretty sturdy.

In newer versions of validator module you can use the following script to prevent XSS attack:
var validator = require('validator');
var escaped_string = validator.escape(someString);

Try out the npm module strip-js. It performs the following actions:
Sanitizes HTML
Removes script tags
Removes attributes such as "onclick", "onerror", etc. which contain JavaScript code
Removes "href" attributes which contain JavaScript code
https://www.npmjs.com/package/strip-js

Update 2021-04-16: xss is a module used to filter input from users to prevent XSS attacks.
Sanitize untrusted HTML (to prevent XSS) with a configuration specified by a Whitelist.
Visit https://www.npmjs.com/package/xss
Project Homepage: http://jsxss.com

You should try library npm "insane".
https://github.com/bevacqua/insane
I try in production, it works well. Size is very small (around ~3kb gzipped).
Sanitize html
Remove all attributes or tags who evaluate js
You can allow attributes or tags that you don't want sanitize
The documentation is very easy to read and understand.
https://github.com/bevacqua/insane

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js