REST Web Services and where to put XSS protection - web-services

I am wondering where the best place to put XSS protection in our website. Our team is split up into a front end and back end teams and are using REST as an API between our two groups since we use different platforms. We have a field that could hold a subset of HTML that should be protected and I was wondering at what layer this should be done?
Should it not be allowed into the database by the webservice or should it be validated by the consumer on the way out, ensuring safety? For fields that cannot contain HTML, we are just saving as the raw input, and having the front end escape them before presentation.
My viewpoint is that the webservice should respond that the data is invalid (we have been using 422 to indicate invalid updates) if someone tries to use disallowed tags. I am just wondering what other people think.

It's probably not an either/or. The web service is potentially callable from many UIs, and Uis change over time, it should not assume that all its callers are careful/trusted. Indeed could someone invoke your service directly by hand-crafting a query?
However for the sake of usability we often choose to do friendly validation and error reporting in the UI. I've just finished filling in an online form at a web site that barfs in the service layer if any field contains a non alpha-numeric. It would have been so much nicer if the UI had validated a the point of entry rather than rejecting my request after 3 pages of input.
(Not to mention that if the web site asks you for an employer's name, and the name actually contains an apostrophe you seem a bit stymied!)

You should be using both. The typical pattern is to attempt to sanitize scary data on the way in (and you should really be rejecting the request if sanitization was necessary for a given value) and encoding on the way out.
The reason for the former is that encoding sometimes gets missed. The reason for the latter is that your database cannot be trusted as a source of data (people can access it without hitting your client, or your client might have missed something).

Related

RESTful API: how to tell whether an object retrieved by GET is editable (e.g, PUT-able) by the current user?

Currently I set up a RESTful API backend using Django and I can list a set of articles by the following GET:
api/articles/
Also, I can get a single article by:
api/article/1/
Each article is owned by a certain user, and one user could have multiple articles of course.
On the frond end side, I present all the articles at loading of the page, and I hope the user who is logged in currently could see the articles that they own in a different style, e.g, outlined by a box, and has a associated "delete" or "edit" button.
This requires me to tell, after the retrieval of the articles, which ones are owned by the current user programmatically. One way of doing this is to check the current user id with the owner id. However I feel this is not a good choice as the user id is the check is done fully on the client side and may be not consistent with the actual server judgement.
Therefore, is there a way, to tell by looking at the response of the GET, (say, let the server return a property "editable=true/false") to get whether the current user could edit(PUT) the resource?
I understand that this could be done at the server side, by attaching such a property manually. However, I am just asking whether there is better/common practice.
I just started learning web development and I am sorry if the question sounds trivial. Thank you!
You can attach propriety manually as you suggested. The advance of this approach is that you dont need any other http request.
Second possibility might be, that your client intentionally request information about endpoint permissions. In this case I would suggest to use OPTIONS HTTP method. You send OPTIONS HTTP request to api/articles/1 and backend returns wanted info. This might be exactly what OPTIONS method and DRF metadata were made for.
http://www.django-rest-framework.org/api-guide/metadata/
I think that this is a very interesting question.
Several options that come to me:
You can add to the GET api/article/1 response a HTTP header with this information i.e. HTTP_METHODS_ALLOWED=PUT,PATH,DELETE. Doing this way helps the API client because it does not need to know anything else. I think that this is not a good approach when more than one entity is returned.
call to OPTIONS api/article/1. Allowed methods for that user on that resource can be returned but notice that, in my opinion, this approach is not very good in terms of performance, because it duplicates the number of requests to the server.
But what if the entity returned also contains information on the owner or it? can, in this case the client know which policy apply and try to figure out it by itself? notice that the policy can be obtained from another endpoint (just one call would be needed) or even with the login response. If your entities do not contain that kind of information, it could be also returned as a HTTP header (like first option above)

How to handle "critical" REST API call failure?

I am creating a mobile app that will interact with a django back-end api. I want to add ability for app user to change some "critical" account attributes via the app, with call to back-end. Eg, change username, which is used now to authenticate with back-end on each call. The success path is simple, but I'm concerned about some failure along the way that leads the app and back end to be out of sync. Eg, user invokes username change on app, the username is successfully updated on the back-end, but something fails and the app never gets a response. So app now is left not knowing if old username still intact or new username is now at back-end. Just wondering if there is any standard pattern for making this type of thing bulletproof. Same scenario holds for password change via app.
Only thing I can think of now is keep both usernames in app until app can confirm current state of back-end...
There is a pattern called "post exactly once" semantics that may be what you're looking for. (See Mark Nottingham's draft for a proposed implementation of this pattern.) It incurs overhead, but for cases when you want to guarantee that a request was processed exactly once, it's appropriate.
The problem is that there, as of yet, isn't a standard way to do this, so using it will (somehwat) couple your client and server together with out-of-band knowledge.

What is the correct way to deal with "Logging In" for an app written against RESTful web services?

I am in the process of building a RESTful API for my application. There are very few services that are public and the rest require authentication and authorization.
To be clear, my question is NOT about authenticating web services. I have already decided to send an HTTP header with an access token provided by the server. The reasons for this include:
Creating a "session" that can track the user activity
Timeout access tokens after XXX amount of inactivity
Track user behavior patterns for each "session"
So far, this approach is working fine. I am interested in any design guidelines for providing a "Login" service. I don't want to just authenticate a request, but I want to track usage of the web service against a "session".
In addition to "session" tracking, we have requirements that require that we track failed login attempts and take appropriate action after XXX number of failed attempts as well as password expiring and email address verification before authorizing, etc.
Specifically, I am concerned with the best way to design the URI's for this. One option would be:
/api/authentication/login?username=UN&password=PW
That could return the access token to be used in the header for secure service calls. Is this a good approach? Is there a better approach? Is there a better patter to use for naming the URI?
My biggest problem is that the URI is not purely sticking with the "URI's should represent resources" approach. End the end it is probably not a big deal, but I am wondering if there are better ways.
Thanks!
Often, RESTful APIs like to be stateless. That means that the API itself doesn't care about keeping a session, and doesn't.
What you do is authenticate 1 time, and then get a temporary key. That key eventually is no good anymore because the key has information in it about when it will expire.
Also, since these large APIs are built on message queues, they know timestamps for each action. and they can basically keep track of activity.
So, in RESTful API design, you often have scenarios where your URL has resources in it, and then there are all sorts of additional things that need to be set.
A good rule of thumb is to hide the complexity behind your ?. A typical use case of this philosophy is where you have a bunch of filter options on a get request of /some/resource. How is this relevant? Well, if you remember that its not a mortal sin to decorate your resource based API with other stuff occassionally, then you can treat other scenarios similarly when you feel like resourcefulness may be in question, but essentially you still have RPC-ish endpoints that need to exist to make your API fully functional for your needs. Or, of course, you can just arbitrarily set certain HTTP verbs to equal those things.
If you want to extend your resources with additional functionality, try to stick to the resource structure in your base url of the call, and then decorate it with your one-off needs.
Resource: /api/authentication
With modifier: /api/authentication/login
With data: /api/authentication/login?username=UN&password=PW
Its not so bad. But again, if you wanted to go completely restful, you could say something like this (this is pure conjecture, you need to decide these things for yourself):
Get logged in status - GET - /api/authentication/:id
Create / Update logged in status - POST / PUT - /api/authentication(/:id)
Log out - DELETE - /api/authentication/:id
... or you could have omitted the :id route and just hid that information in the body of data appended to the call, aka hiding complexity

Prevent anyone from executing your web service?

I've got a webservice which is executed through javascript (jquery) to retrieve data from the database. I would like to make sure that only my web pages can execute those web methods (ie I don't want people to execute those web methods directly - they could find out the url by looking at the source code of the javascript for example).
What I'm planning to do is add a 'Key' parameter to all the webmethods. The key will be stored in the web pages in a hidden field and the value will be set dynamically by the web server when the web page is requested. The key value will only be valid for, say, 5 minutes. This way, when a webmethod needs to be executed, javascript will pass the key to the webmethod and the webmethod will check that the key is valid before doing whatever it needs to do.
If someone wants to execute the webmethods directly, they won't have the key which will make them unable to execute them.
What's your views on this? Is there a better solution? Do you forsee any problems with my solution?
MORE INFO: for what I'm doing, the visitors are not logged in so I can't use a session. I understand that if someone really wants to break this, they can parse the html code and get the value of the hidden field but they would have to do this regularly as the key will change every x minutes... which is of course possible but hopefully will be a pain for them.
EDIT: what I'm doing is a web application (as opposed to a web site). The data is retrieved through web methods (+jquery). I would like to prevent anyone from building their own web application using my data (which they could if they can execute the web methods). Obviously it would be a risk for them as I could change the web methods at any time.
I will probably just go for the referrer option. It's not perfect but it's easy to implement. I don't want to spend too much time on this as some of you said if someone really wants to break it, they'll find a solution anyway.
Thanks.
Well, there's nothing technical wrong with it, but your assumption that "they won't have the key which will make them unable to execute them" is incorrect, and thus the security of the whole thing is flawed.
It's very trivial to retrieve the value of a hidden field and use it to execute the method.
I'll save you a lot of time and frustration: If the user's browser can execute the method, a determined user can. You're not going to be able to stop that.
With that said, any more information on why you're attempting to do this? What's the context? Perhaps there's something else that would accomplish your goal here that we could suggest if we knew more :)
EDIT: Not a whole lot more info there, but I'll run with it. Your solution isn't really going to increase the security at all and is going to create a headache for you in maintenance and bugs. It will also create a headache for your users in that they would then have an 'invisible' time limit in which to perform actions on pages. With what you've told us so far, I'd say you're better off just doing nothing.
What kind of methods are you trying to protect here? Why are you trying to protect them?
ND
MORE INFO: for what I'm doing, the visitors are not logged in so I can't use a session.
If you are sending a client a key that they will send back every time they want to use a service, you are in effect creating a session. The key you are passing back and forth is functionally no different than a cookie (expect that it will be passed back only on certain requests.) Might as well just save the trouble and set a temporary cookie that will expire in 5 minutes. Add a little server side check for expired cookies and you'll have probably the best you can get.
You may already have such a key, if you're using a language or framework that sets a session id. Send that with the Ajax call. (Note that such a session lasts a bit longer than five minutes, but note also it's what you're using to keep state for the users regular HTPP gets and posts.)
What's to stop someone requesting a webpage, parsing the results to pull out the key and then calling the webservice with that?
You could check the referrer header to check the call is coming from one of your pages, but that is also easy to spoof.
The only way I can see to solve this is to require authentication. If your webpages that call the webservice require the user to be logged in then you can check the that they're logged in when they call the webservice. This doesn't stop other pages from using your webservice, but it does let you track usage more and with some rate limiting you should be able to prevent abuse of your service.
If you really don't want to risk your webservice being abused then don't make it public. That's the only failsafe solution.
Let's say that you generate a key valid from 12.00 to 12.05. At 12.04 i open the page, read it with calm, and at 12.06 i trigger action which use your web service. I'll be blocked from doing so even i'm a legit visitor.
I would suggest to restrain access to web services by http referrer (allow only those from your domain and null referrers) and/or require user authentication for calling methods.

Can you explain the Web concept of RESTful?

Looking for clear and concise explanations of this concept.
A RESTful application is an application that exposes its state and functionality as a set of resources that the clients can manipulate and conforms to a certain set of principles:
All resources are uniquely addressable, usually through URIs; other addressing can also be used, though.
All resources can be manipulated through a constrained set of well-known actions, usually CRUD (create, read, update, delete), represented most often through the HTTP's POST, GET, PUT and DELETE; it can be a different set or a subset though - for example, some implementations limit that set to read and modify only (GET and PUT) for example
The data for all resources is transferred through any of a constrained number of well-known representations, usually HTML, XML or JSON;
The communication between the client and the application is performed over a stateless protocol that allows for multiple layered intermediaries that can reroute and cache the requests and response packets transparently for the client and the application.
The Wikipedia article pointed by Tim Scott gives more details about the origin of REST, detailed principles, examples and so on.
The best explanation I found is in this REST tutorial.
REST by way of an example:
POST /user
fname=John&lname=Doe&age=25
The server responds:
200 OK
Location: /user/123
In the future, you can then retrieve the user information:
GET /user/123
The server responds:
200 OK
<fname>John</fname><lname>Doe</lname><age>25</age>
To update:
PUT /user/123
fname=Johnny
Frankly, the answer depends on context. REST and RESTful have meanings depending on what language or framework you're using or what you're trying to accomplish. Since you've tagged your question under "web services" I'll answer in the context of RESTful web services, which is still a broad category.
RESTful web services can mean anything from a strict REST interpretation, where all actions are done in a strict "RESTful" manner, to a protocol that is plain XML, meaning its not SOAP or XMLRPC. In the latter case, this is a misnomer: such a REST protocol is really a "plain old XML" (or "POX") protocol. While REST protocols usually use XML and as such are POX protocols, this doesn't necessarily have to be the case, and the inverse is not true (a just because a protocol uses XML doesn't make it RESTful).
Without further ado, a truly RESTful API consists of actions taken on objects, represented by the HTTP method used and the URL of that object. The actions are about the data and not about what the method does. For example, CRUD actions (create, read, update, and delete) can map to a certain set of URLs and actions. Lets say you are interacting with a photo API.
To create a photo, you'd send data via a POST request to /photos. It would let you know where the photo is via the Location header, e.g. /photos/12345
To view a photo, you'd use GET /photos/12345
To update a photo, you'd send data via a PUT request to /photos/12345.
To delete a photo, you'd use DELETE /photos/12345
To get a list of photos, you'd use GET /photos.
Other actions might be implemented, like the ability to copy photos via a COPY request.
In this way, the HTTP method you're using maps directly to the intent of your call, instead of sending the action you wish to take as part of the API. To contrast, a non-RESTful API might use many more URLs and only use the GET and POST actions. So, in this example, you might see:
To create a photo, send a POST to /photos/create
To view a photo, send a GET to /photos/view/12345
To update a photo, send a POST to /photos/update/12345
To delete a photo, send a GET to /photos/delete/12345
To get a list of photos, send a GET to /photos/list
You'll note how in this case the URLs are different and the methods are chosen only out of technical necessity: to send data, you must use a POST, while all other requests use GET.
Just a few points:
RESTFul doesn't depend on the framework you use. It depends on the architectural style it describes. If you don't follow the constraints, you're not RESTful. The constraints are defined in half a page of Chapter 5 of Roy Fielding's document, I encourage you to go and read it.
The identifier is opaque and does not cary any information beyond the identification of a resource. It's a nmae, not input data, just names. as far as the client is concerned, it has no logic or value beyond knowing how to build querystrings from a form tag. If your client builds its own URIs using a schema you've decided up-front, you're not restful.
The use or not use of all the http verbs is not really the constraint, and it's perfectly acceptable to design an architecture that only supports POST.
Caching, high decoupling, lack of session state and layered architecture are the points few talk about but the most important to the success of a RESTful architecture.
If you don't spend most of your time crafting your document format, you're probably not doing REST.
It means using names to identify both commands and parameters.
Instead of names being mere handles or monikers, the name itself contains information. Specifically, information about what is being requested, parameters for the request, etc..
Names are not "roots" but rather actions plus input data.
I've learned the most from reading the articles published on InfoQ.com:
http://www.infoq.com/rest and the RESTful Web Services book (http://oreilly.com/catalog/9780596529260/).
./alex
Disclaimer: I am associated with InfoQ.com, but this recommendation is based on my own learning experience.