How do you allow the usage of an <img> while preventing XSS?

How do you allow the usage of an <img> while preventing XSS? - xss

I'm using ASP.NET Web Forms for blog style comments.
Edit 1: This looks way more complicated then I first thought. How do you filter the src?
I would prefer to still use real html tags but if things get too complicated that way, I might go a custom route. I haven't done any XML yet, so do I need to learn more about that?

If IMG is the only thing you'd allow, I'd suggest you use a simple square-bracket syntax to allow it. This would eliminate the need for a parser and reduce a load of other dangerous edge cases with the parser as well. Say, something like:
Look at this! [http://a.b.c/m.jpg]
Which would get converted to
Look at this! <img src="http://a.b.c/m.jpg" />
You should filter the SRC address so that no malicious things get passed in the SRC part too. Like maybe
Look at this! [javascript:alert('pwned!')]

Use an XML parser to validate your input, and drop or encode all elements, and attributes, that you do not want to allow. In this case, delete or encode all tags except the <img> tag, and all attributes from that except src, alt and title.

If you end up going with a non-HTML format (which makes things easier b/c you can literally escape all HTML), use a standard syntax like markdown. The markdown image syntax is ![alt text](/path/to/image.jpg)
There are others also, like Textile. Its syntax for images is !imageurl!

#chakrit suggested using a custom syntax, e.g. bracketed URLs - This might very well be the best solution. You DEFINITELY dont want to start messing with parsing etc.
Just make sure you properly encode the entire comment (according to the context - see my answer on this here Will HTML Encoding prevent all kinds of XSS attacks?)
(btw I just discovered a good example of custom syntax right there... ;-) )
As also mentioned, restrict the file extension to jpg/gif/etc - even though this can be bypassed, and also restrict the protocol (e.g. http://).
Another issue to be considered besides XSS - is CSRF (http://www.owasp.org/index.php/Cross-Site_Request_Forgery). If you're not familiar with this security issue, it basically allows the attacker to force my browser to submit a valid authenticated request to your application, for instance to transfer money or to change my password. If this is hosted on your site, he can anonymously attack any vulnerable application - including yours. (Note that even if other applications are vulnerable, its not your fault they get attacked, but you still dont want to be the exploit host or the source of the attack...). As far as your own site goes, it's that much easier for the attacker to change the users password on your site, for instance.

Related

CFWheels: Redirect to URL with Params Hidden

I am using redirectTo() function with params to redirect to another pages with a query string in the url. For security purpose this does not look appealing because the user can change the parameters in the url, thus altering what is inserted into the database.
My code is:
redirectTo(action="checklist", params="r=#r#&i=#insp#&d=#d#");
Is there anyway around this? I am not using a forms, I just wish to redirect and I want the destination action/Controller to know what I am passing but not display it in the url.

You can obfuscate the variables in the URL. CfWheels makes this really easy.
All you have to do is call set(obfuscateURLs=true) in the config/settings.cfm file to turn on URL obfuscation.
I am sure this works with linkTo() function. I hope it works with RedirectTo() funcation as well. I do not have a set up to check it now. But if doesn't work for RedirectTo(), you can obfuscateParam() and deObfuscateParam() functions to do job for you.
Caution: This will only make harder for user to guess the value. It doesn't encrypt value.
To know more about this, Please read the document configuration and defaults and obfuscating url

A much better approach to this particular situation is to write params to the [flash].1 The flash is exactly the same thing as it is in Ruby on Rails or the ViewBag in ASP.Net. It stores the data in a session or cookie variable and is deleted at the end of the next page's load. This prevents you from posting back long query strings like someone that has been coding for less than a year. ObfuscateParam only works with numbers and is incredibly insecure. Any power user can easily deobfuscate, even more so with someone that actually makes a living stealing data.

RESTservice, resource with two different outputs - how would you do it?

Im currently working on a more or less RESTful webservice, a type of content api for my companys articles. We currently have a resource for getting all the content of a specific article
http://api.com/content/articles/{id}
will return a full set of article data of the given article id.
Currently we control alot of the article's business logic becasue we only serve a native-app from the webservice. This means we convert tags, links, images and so on in the body text of the article, into a protocol the native-app can understand. Same with alot of different attributes and data on the article, we will transform and modify its original (web) state into a state that the native-app will understand.
fx. img tags will be converted from a normal <img src="http://source.com"/> into a <img src="inline-image//{imageId}"/> tag, samt goes for anchor tags etc.
Now i have to implement a resource that can return the articles data in a new representation
I'm puzzled over how best to do this.
I could just implement a completely new resource, on a different url like: content/articles/web/{id} and move the old one to content/article/app/{id}
I could also specify in my documentation of the resource, that a client should always specify a specific request header maybe the Accept header for the webservice to determine which representation of the article to return.
I could also just use the original url, and use a url parameter like .../{id}/?version=app or .../{id}/?version=web
What would you guys reckon would be the best option? My personal preference lean towards option 1, simply because i think its easier to understand for clients of the webservice.
Regards, Martin.
EDIT:
I have chosen to go with option 1. Thanks for helping out and giving pros and cons. :)

I would choose #1. If you need to preserve the existing URLS you could add a new one content/articles/{id}/native or content/native-articles/{id}/. Both are REST enough.
Working with paths make content more easily cacheable than both header or param options. Using Content-Type overcomplicates the service especially when both are returning JSON.

Use the HTTP concept of Content Negotiation. Use the Accept header with vendor types.
Get the articles in the native representation:
GET /api.com/content/articles/1234
Accept: application/vnd.com.exmaple.article.native+json
Get the articles in the original representation:
GET /api.com/content/articles/1234
Accept: application/vnd.com.exmaple.article.orig+json

Option 1 and Option 3
Both are perfectly good solutions. I like the way Option 1 looks better, but that is just aesthetics. It doesn't really matter. If you choose one of these options, you should have requests to the old URL redirect to the new location using a 301.
Option 2
This could work as well, but only if the two responses have a different Content-Type. From the description, I couldn't really tell if this was the case. I would not define a custom Content-Type in this case just so you could use Content Negotiation. If the media type is not different, I would not use this option.

Perhaps option 2 - with the header being a Content-Type?
That seems to be the way resources are served in differing formats; e.g. XML, JSON, some custom format

Externally linked images - How to prevent cross site scripting

On my site, I want to allow users to add reference to images which are hosted anywhere on the internet. These images can then be seen by all users of my site. As far as I understand, this could open the risk of cross site scripting, as in the following scenario:
User A adds a link to a gif which he hosts on his own webserver. This webserver is configured in such a way, that it returns javascript instead of the image.
User B opens the page containg the image. Instead of seeing the image, javascript is executed.
My current security messures are currently such, that both on save and open, all content is encoded.
I am using asp.net(c#) on the server and a lot of jquery on the client to build ui elements, including the generation of image tags.
Is this fear of mine correct? Am I missing any other important security loopholes here? And most important of all, how do I prevent this attack? The only secure way I can think of right now, is to webrequest the image url on the server and check if it contains anything else than binary data...

Checking the file is indeed an image won't help. An attacker could return one thing when the server requests and another when a potential victim makes the same request.
Having said that, as long as you restrict the URL to only ever be printed inside the src attribute of an img tag, then you have a CSRF flaw, but not an XSS one.
Someone could for instance create an "image" URL along the lines of:
http://yoursite.com/admin/?action=create_user&un=bob&pw=alice
Or, more realistically but more annoyingly; http://yoursite.com/logout/
If all sensitive actions (logging out, editing profiles, creating posts, changing language/theme) have tokens, then an attack vector like this wouldn't give the user any benefit.
But going back to your question; unless there's some current browser bug I can't think of you won't have XSS. Oh, remember to ensure their image URL doesn't include odd characters. ie: an image URL of "><script>alert(1)</script><!-- may obviously have bad effects. I presumed you know to escape that.

Your approach to security is incorrect.
Don't approach the topic as "I have a user input, so how can I prevent XSS". Rather approach it like it this: "I have user input - it should be restrictive as possible - i.e. allowing nothing through". Then based on that allow only what's absolutely essential - plain-text strings thoroughly sanitized to prevent anything but a URL, and the specific, necessary characters for URLS only. Then Once it is sanitized I should only allow images. Testing for that is hard because it can be easily tricked. However, it should still be tested for. Then because you're using an input field you should make sure that everything from javascript scripts and escape characters, HTML, XML and SQL injections are all converted to plaintext and rendered harmless and useless. Consider your users as being both idiots and hackers - that they'll input everything incorrectly and try to hack something into your input space.
Aside from that you may run into som legal issues with regard to copyright. Copyrighted images generally may not be used on other people's sites without the copyright owner's consent and permission - usually obtained in writing (or email). So allowing users the opportunity to simply lift images from a site could run the risk of allowing them to take copyrighted material and reposting it on your site without permission which is illegal. Some sites are okay with citing the source, others require a fee to be paid, and others will sue you and bring your whole domain down for copyright infringement.

Django: assign accesskey to label instead of select

i'm developing a site that must be as accessible as possible. While assigning the accesskeys to my form fields with
widget=FieldWidget(attrs={'accesskey':'A'})
i found out that the w3c validator won't validate an xhtml strict page with an accesskey in a select tag. Anyway i couldn't find a way to assign an accesskey to the label related to the select field (the right way to make the select accessible). Is there a way to do so?
Thanks

Interesting question. HTML 4.01 also prohibits accesskey in a select.
I believe the Short Answer is: Not in standard Django.
Much longer answer: I looked at the code in django/forms/fields.py and .../widgets.py and the label is handled strictly as a string (forced to smart_unicode()). Four possible solutions come to mind, the first three are not pretty:
Ignore the validation failure. I hate doing this, but sometimes it's a necessary kludge. Most browsers are much looser than the DTDs in what they allow. If you can get the accesskey to work even when it's technically in the wrong place, that might be the simplest way to go.
Catch the output of the template and do some sort of ugly search-and-replace. (Blech!)
Add new functionality to the widgets/forms code by MonkeyPatching it. MonkeyPatch django.forms.fields.Field to catch and save a new arg (label_attrs?). MonkeyPatch the label_tag() method of forms.forms.BoundField to deal with the new widget.label_attrs value.
I'm deliberately not going to give more details on this. If you understand the code well enough to MonkeyPatch it, then you are smart enough to know the dangers inherent in doing this.
Make the same functional changes as #3, but do it as a submitted patch to the Django code base. This is the best long-term answer for everyone, but it's also the most work for you.
Update: Yoni Samlan's link to a custom filter (http://www.djangosnippets.org/snippets/693/) works if you are generating the <label> tag yourself. My answers are directed toward still using the full power of Forms but trying to tweak the resultant <label>.

Best way to decide on XML or HTML response?

I have a resource at a URL that both humans and machines should be able to read:
http://example.com/foo-collection/foo001
What is the best way to distinguish between human browsers and machines, and return either HTML or a domain-specific XML response?
(1) The Accept type field in the request?
(2) An additional bit of URL? eg:
http://example.com/foo-collection/foo001 -> returns HTML
http://example.com/foo-collection/foo001?xml -> returns, er, XML
I do not wish to oblige machines reading the resource to parse HTML (or XHTML for that matter). Machines like the googlebot should receive the HTML response.
It is reasonable to assume I control the machine readers.

If this is under your control, rather than adding a query parameter why not add a file extension:
http://example.com/foo-collection/foo001.html - return HTML
http://example.com/foo-collection/foo001.xml - return XML
Apart from anything else, that means if someone fetches it with wget or saves it from their browser, it'll have an appropriate filename without any fuss.

My preference is to make it a first-class part of the URI. This is debatable, since there are -- in a sense -- multiple URI's for the same resource. And is "format" really part of the URI?
http://example.com/foo-collection/html/foo001
http://example.com/foo-collection/xml/foo001
These are very easy deal with in a web framework that has URI parsing to direct the request to the proper application.

If this is indeed the same resource with two different representations, the HTTP invites you to use the Accept-header as you suggest. This is probably a very reliable way to distinguish between the two different scenarios. You can be plenty sure that user agents (including search engine spiders) send the Accept-header properly.
About the machine agents you are going to give XML; are they under your control? In that case you can be doubly sure that Accept will work. If they do not set this header properly, you can give XML as default. User agents DO set the header properly.
I would try to use the Accept heder for this, because this is exactly what the Accept header is there for.
The problem with having two different URLs is that is is not automatically apparent that these two represent the same underlying resource. This can be bad if a user finds an URL in one program, which renders HTML, and pastes it in the other, which needs XML. At this point a smart user could probably change the URL appropriately, but this is just a source of error that you don't need.

I would say adding a Query String parameter is your best bet. The only way to automatically detect whether your client is a browser(human) or application would be to read the User-Agent string from the HTTP Request. But this is easily set by any application to mimic a browser, you're not guaranteed that this is going to work.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js