How do I encode html leaving out the safe html - xss

My data coming from the database might contain some html. If I use
string dataFromDb = "Some text<br />some more <br><ul><li>item 1</li></ul>";
HttpContext.Current.Server.HtmlEncode(dateFromDb);
Then everything gets encoded and I see the safe Html on the screen.
However, I want to be able to execute the safe html as noted in the dataFromDb above.
I think I am trying to create white list to check against.
How do I go about doing this?
Is there some Regex already out there that can do this?

Check out this article the AntiXSS library is also worth a look

You should use the Microsoft AntiXSS library. I believe the latest version is available here. Specifically, you'll want to use the GetSafeHtmlFragment method.

Related

Script to generate html Beyond Compare folder differences

I've found several ways to automate folder comparison using scripts in Beyond Compare, but none that produce the pretty html report created from Session>Folder Compare Report>View in browser.
Here is an example of what that looks like.
I would love to be able to find the script that gives me that html difference report.
Thanks!
This is what I am currently getting
load "C:\Users\UIDQ5763\Desktop\Enviornment.cpp" &
"C:\Users\UIDQ5763\Desktop\GreetingsConsoleApp"
folder-report layout:side-by-side options:display-all &
output-to:C:\Users\UIDQ5763\Report.html output-options:html-color
The documentation for Beyond Compare's scripting language is here. You were probably missing either layout:side-by-side, which gives the general display, or output-options:html-color which is required to get the correct HTML stylized output. You may want to change options:display-all to options:display-mismatches if you only want to see the differences, and you might want to add an expand all command immediately before the folder-report line if you want to see the subfolders recursively.'
The & characters shown in the sample are line continuation characters. Remove them if you don't need to wrap your lines.

django - ckeditor bug renders text/string in raw html format

am using django ckeditor. Any text/content entered into its editor renders raw html output on the webpage.
for ex: this is rendered output of ckeditor field (RichTextField) on a webpage;
<p><span style="color:rgb(0, 0, 0)">this is a test file ’s forces durin</span><span style="color:rgb(0, 0, 0)">galla’s good test is one that fails Thereafter, never to fail in real environment. </span></p>
I have been looking for a solution for a long time now but unable to find one :( There are some questions which are similar but none of those have been able to help. It will be helpful if any changes suggested are provided with the exact location where it needs to be changed. Needless to say I am a newbie.
Thanks
You need to mark the relevant variable that contains the html snippet in your template as safe
Obviously you should be sure, that the text comes from trusted users and is safe, because with the safe filter you are disabling a security feature (autoescaping) that Django applies per default.
If your ckeditor is part of a comment form and your mark the entered text as safe, anybody with access to the form could inject Javascipt and other (potentially nasty) stuff in your page.
The whole story is explained pretty well in the official docs: https://docs.djangoproject.com/en/dev/topics/templates/#automatic-html-escaping

Cleansing string / input in Coldfusion 9

I have been working with Coldfusion 9 lately (background in PHP primarily) and I am scratching my head trying to figure out how to 'clean/sanitize' input / string that is user submitted.
I want to make it HTMLSAFE, eliminate any javascript, or SQL query injection, the usual.
I am hoping I've overlooked some kind of function that already comes with CF9.
Can someone point me in the proper direction?
Well, for SQL injection, you want to use CFQUERYPARAM.
As for sanitizing the input for XSS and the like, you can use the ScriptProtect attribute in CFAPPLICATION, though I've heard that doesn't work flawlessly. You could look at Portcullis or similar 3rd-party CFCs for better script protection if you prefer.
This an addition to Kyle's suggestions not an alternative answer, but the comments panel is a bit rubbish for links.
Take a look a the ColdFusion string functions. You've got HTMLCodeFormat, HTMLEditFormat, JSStringFormat and URLEncodedFormat. All of which can help you with working with content posted from a form.
You can also try to use the regex functions to remove HTML tags, but its never a precise science. This ColdFusion based regex/html question should help there a bit.
You can also try to protect yourself from bots and known spammers using something like cfformprotect, which integrates Project Honeypot and Akismet protection amongst other tools into your forms.
You've got several options:
"Global Script Protection" Administrator setting, which applies a regular expression against post and get (i.e. FORM and URL) variables to strip out <script/>, <img/> and several other tags
Use isValid() to validate variables' data types (see my in depth answer on this one).
<cfqueryparam/>, which serves to create SQL bind parameters and validate the datatype passed to it.
That noted, if you are really trying to sanitize HTML, use Java, which ColdFusion can access natively. In particular use the OWASP AntiSamy Project, which takes an HTML fragment and whitelists what values can be part of it. This is the same approach that sites like SO and slashdot.org use to protect submissions and is a more secure approach to accepting markup content.
Sanitation of strings in coldfusion and in quite any language is very important and depends on what you want to do with the string. most mitigations are for
saving content to database (e.g. <cfqueryparam ...>)
using content to show on next page (e.g. put url-parameter in link or show url-parameter in text)
saving files and using upload filenames and content
There is always a risk if you follow the idea to prevent and reduce a string by allow basically everything in the first step and then sanitize malicious code "away" by deleting or replacing characters (blacklist approach).
The better solution is to replace strings with rereplace(...) agains regular expressions that explicitly allow only the characters needed for the scenario you use it as an easy solution, whenever this is possible. use cases are inputs for numbers, lists, email-addresses, urls, names, zip, cities, etc.
For example if you want to ask for a email-address, you could use
<cfif reFindNoCase("^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.(?:[A-Z]{5})$", stringtosanitize)>...ok, clean...<cfelse>...not ok...</cfif>
(or an own regex).
For HTML-Imput or CSS-Imput I would also recommend OWASP Java HTML Sanitizer Project.

How can I sanitize user input but keep the content of <pre> tags?

I'm using CKEditor in Markdown format to submit user created content. I would like to sanitize this content from malicious tags, but I would like to keep the formatting that is the result of the markdown parser. I've used two methods that do not work.
Method one
<!--- Sanitize post content --->
<cfset this.text = HTMLEditFormat(this.text)>
<!--- Apply mark down parser --->
<cfx_markdown textIn="#this.text#" variable="parsedNewBody">
Problem For some reason <pre> and <blockquote> are being escaped, and thus I'm unable to use them. Only special characters appear. Other markdown tagging works well, such as bold, italic, etc. Could it be CKEdit does not apply markdown correctly to <pre> and <blockquote>?
Example: If I were to type <pre><script>alert("!");</script></pre> I would get the following: <script>alert("!");</script>
Method two
Same as method one, but reverse the order where the sanitation takes place after the markdown parser has done it's work. This is effectively useless since the sanitation function will escape all the tags, malicious ones or ones created by the markdown parser.
While I want to sanitize malicious content, I do want to keep basic HTML tags and contents of <pre> and <blockquote> tags!--any ideas how?
Thanks!
There are two important sanitizations that need to be done on user generated content. First, you want to protect your database from SQL injection. You can do this by using stored procedures or the <cfqueryparam> tag, without modifying the data.
The other thing you want to do is protect your site from XSS and other content-display based attacks. The way you do this is by sanitizing the content on display. It would be fine, technically, to do it before saving, but generally the best practice is to store the highest fidelity data possible and only modify it for display. Either way, I think your problem is that you're doing this sanitization out of order. You should run the Markdown formatter on the content first, THEN run it through HTMLEditFormat().
It's also important to note that HTMLEditFormat will not protect you from all attacks, but it's a good start. You'll want to look into implementing OWASP utilities, which is not difficult in ColdFusion, as you can directly use the provided Java implementation.
Why don't you just prepend and append pre tag after parsing?
I mean, if you only care about first an dlast pre and you dont have nested pre's or similar. If you cfx tag clears pre, make new wrapper method which is going to check if <pre> exists and if not, add it. Also if you use pre tags I guess new line chars are important, so check what your cfx does with those.
Maybe HTMLEditFormat twin HTMLCodeFormat is what you need?

How to not transform special characters to html entities with owasp antisamy

I use Owasp Anti samy with Ebay policy file to prevent XSS attacks on my website.
I also use Hibernate search to index my objects.
When I use this code:
String html = "special word: été";
// use the Ebay configuration file
Policy policy = Policy.getInstance(xssPolicyFile.getInputStream());
AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(html, policy);
// result is now : "special word: été"
result = cr.getCleanHTML();
As you can see all chars "é" has been transformed to their html entity equivalent "é"
My page is on UTF-8, so I don't need this transformation. Moreover, when I index this text with Hibernate Search, it indexes the word with html entities, so I can't find word "été" on my index.
How can I force antisamy to not transform special chars to their html entity equivalent ?
thanks
PS: an issue has been opened : http://code.google.com/p/owaspantisamy/issues/detail?id=99
I ran into the same problem this morning.
I have encapsulated antisamy in a class and I use apache StringEscapeUtil from apache common-lang to restore special characters.
CleanResults cleanResults = antiSamy.scan(taintedHtml);
cleanedHtml = cleanResults.getCleanHTML();
return StringEscapeUtils.unescapeHtml(cleanedHtml)
The result is a cleaned up HTML without the HTML escaping of special characters.
Hope this helps.
Like Mohamad said it in a comment, Antisamy has just released a new directive named : entityEncodeIntlChars
here is the detail : http://code.google.com/p/owaspantisamy/source/detail?r=240
It seems that this directive solves the problem.
After scouring the AntiSamy source code, I found no way of changing this behavior apart from modifying AntiSamy.
Check out this one: http://code.google.com/p/owaspantisamy/source/browse/#svn/trunk/dotNet/current/source/owaspantisamy/html/scan
Grab the source and notice that key classes (AntiSamyDOMScanner, CleanResults) use standard framework objects (like XmlDocument). Compile and run with the binary you compiled - so that you can see everything in a debugger - as in which of the major classes actually corrupts your data. With that in hand you'll be able to either change a few properties on major objects to make it stop or inject your own post-processing to revert the wrongdoing (say with a regexp). Latter you can expose that as additional top-level property, say one named NoMess :-)
Chances are that behavior in that respect is different between languages (there's 3 in that trunk) but the same tactics will work no matter which one you have to deal with.