I working on a service which scrapes specific links from blogs. The service makes calls to different sites which pulls in and stores the data.
I'm having troubles specifying the url for updating the data on the server where I now use the verb update to pull in the latest links.
I currently use the following endpoints:
GET /user/{ID}/links - gets all previously scraped links (few milliseconds)
GET /user/{ID}/links/update - starts scraping and returned the scraped data (few seconds)
What would be a good option for the second url? some examples I came up with myself.
GET /user/{ID}/links?collection=(all|cached|latest)
GET /user/{ID}/links?update=1
GET /user/{ID}/links/latest
GET /user/{ID}/links/new
Using GET to start a process isn't very RESTful. You aren't really GETting information, you're asking the server to process information. You probably want to POST against /user/{ID]/links (a quick Google for PUT vs POST will give you endless reading if you're curious about the finer points there). You'd then have two options:
POST with background process: If using a background process (or queue) you can return a 202 Accepted, indicating that the service has accepted the request and is about to do something. 202 generally indicates that the client shouldn't wait around, which makes sense when performing time dependent actions like scraping. The client can then issue GET requests on the first link to retrieve updates.
Creative use of Last-Modified headers can tell the client when new updates are available. If you want to be super fancy, you can implement HEAD /user/{ID}/links that will return a Last-Modified header without a response body (saving both bandwidth and processing).
POST with direct processing: If you're doing the processing during the request (not a great plan in the grand scheme of things), you can return a 200 OK with a response body containing the updated links.
Subsequent GETs would perform as normal.
More info here
And here
And here
We are developing a web API which processes potentially very large amounts of user-submitted content, which means that calls to our endpoints might not return immediate results. We are therefore looking at implementing an asynchronous/non-blocking API. Currently our plan is to have the user submit their content via:
POST /v1/foo
The JSON response body contains a unique request ID (a UUID), which the user then submits as a parameter in subsequent polling GETs on the same endpoint:
GET /v1/foo?request_id=<some-uuid>
If the job is finished the result is returned as JSON, otherwise a status update is returned (again JSON).
(Unless they fail both the above calls simply return a "200 OK" response.)
Is this a reasonable way of implementing an asynchronous API? If not what is the 'right' (and RESTful) way of doing this? The model described here recommends creating a temporary status update resource and then a final result resource, but that seems unnecessarily complicated to me.
Actucally the way described in the blog post you mentioned is the 'right' RESTful way of processing aysnchronous operations. I've implemented an API that handles large file uploads and conversion and does it this way. In my opinion this is not over complicated and definitely better then delaying the response to the client or something.
Some additional note: If a task has failed, I would also return 200 OK together with a representation of the task resource and the information that the resource creation has failed.
I have a situation where I need my API to have a call for triggering a service-side event, no information (besides authentication) is needed from the client, and nothing needs to be returned by the server. Since this doesn't fit well into the standard CRUD/Resource interaction, should I take this as an indicator that I'm doing something wrong, or is there a RESTful design pattern to deal with these conditions?
Your client can just:
POST /trigger
To which the server would respond with a 202 Accepted.
That way your request can still contain the appropriate authentication headers, and the API can be extended in the future if you need the client to supply an entity, or need to return a response with information about how to query the event status.
There's nothing "non-RESTful" about what you're trying to do here; REST principles don't have to correlate to CRUD operations on resources.
The spec for 202 says:
The entity returned with this response SHOULD include an indication of
the request's current status and either a pointer to a status monitor
or some estimate of when the user can expect the request to be
fulfilled.
You aren't obliged to send anything in the response, given the "SHOULD" in the definition.
REST defines the nature of the communication between the client and server. In this case, I think the issues is there is no information to transfer.
Is there any reason the client needs to initiate this at all? I'd say your server-side event should be entirely self-contained within the server. Perhaps kick it off periodically with a cron call?
Company A has async pooling based webservice for notifications. Company B checks for notifications. Every time when it reads new notifications A deletes them from the system. Thus subsequent read requests return only new notifications. There is also requirement for the client B to interrupt the connection if there is no response within 30 sec.
This causes one potential problem: Due to unexpected slowness it is possible for A get the request deleted a notification and send the response back while B is already interrupted the connection. Under this scenario notification gets lost. Now one can argue that the core problem lies within operation realm (the HTTP response must be delivered withing 20 sec ) still on practice it is not always feasible.
How to design B (the client) to avoid this problem?
One way I can see is to do not delete the notifications by A and make B be aware of its state, so that it knows starting from what ID it needs to process notifications, but that presumes that ID will be sequential. Which is controlled by A. Even if B defines its own sequence A still has to be altered to return it back.
Are there any other approaches?
Thanks!
Web services in general are unreliable enough that it's rarely a good idea to make a "read" request serve double-duty as a "delete" request, especially without the client's knowledge. There is just too much risk of a connection dropping or timing out. There is no way to get around this only by modifying the client, because it's the server that is at fault here - the way it's designed is fundamentally unsuited for a web service.
I think you're on the right track with the incrementing IDs idea. The client knows (or can be modified to know) which notifications it's received, so if it can supply the ID of the last message it's received when it polls for notifications, the server should be able to respond based on that ID.
It really seems like Company A's webservice should be synchronous instead of asynchronous. If that is not possible, it may be a good idea to send a "ACK"-like response to a new Company A webservice that indicates a specific notification was received (by Company B) and can be deleted.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm looking for guidance on good practices when it comes to return errors from a REST API. I'm working on a new API so I can take it any direction right now. My content type is XML at the moment, but I plan to support JSON in future.
I am now adding some error cases, like for instance a client attempts to add a new resource but has exceeded his storage quota. I am already handling certain error cases with HTTP status codes (401 for authentication, 403 for authorization and 404 for plain bad request URIs). I looked over the blessed HTTP error codes but none of the 400-417 range seems right to report application specific errors. So at first I was tempted to return my application error with 200 OK and a specific XML payload (ie. Pay us more and you'll get the storage you need!) but I stopped to think about it and it seems to soapy (/shrug in horror). Besides it feels like I'm splitting the error responses into distinct cases, as some are http status code driven and other are content driven.
So what is the industry recommendations? Good practices (please explain why!) and also, from a client pov, what kind of error handling in the REST API makes life easier for the client code?
A great resource to pick the correct HTTP error code for your API:
http://www.codetinkerer.com/2015/12/04/choosing-an-http-status-code.html
An excerpt from the article:
Where to start:
2XX/3XX:
4XX:
5XX:
So at first I was tempted to return my application error with 200 OK and a specific XML payload (ie. Pay us more and you'll get the storage you need!) but I stopped to think about it and it seems to soapy (/shrug in horror).
I wouldn't return a 200 unless there really was nothing wrong with the request. From RFC2616, 200 means "the request has succeeded."
If the client's storage quota has been exceeded (for whatever reason), I'd return a 403 (Forbidden):
The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.
This tells the client that the request was OK, but that it failed (something a 200 doesn't do). This also gives you the opportunity to explain the problem (and its solution) in the response body.
What other specific error conditions did you have in mind?
The main choice is do you want to treat the HTTP status code as part of your REST API or not.
Both ways work fine. I agree that, strictly speaking, one of the ideas of REST is that you should use the HTTP Status code as a part of your API (return 200 or 201 for a successful operation and a 4xx or 5xx depending on various error cases.) However, there are no REST police. You can do what you want. I have seen far more egregious non-REST APIs being called "RESTful."
At this point (August, 2015) I do recommend that you use the HTTP Status code as part of your API. It is now much easier to see the return code when using frameworks than it was in the past. In particular, it is now easier to see the non-200 return case and the body of non-200 responses than it was in the past.
The HTTP Status code is part of your api
You will need to carefully pick 4xx codes that fit your error conditions. You can include a rest, xml, or plaintext message as the payload that includes a sub-code and a descriptive comment.
The clients will need to use a software framework that enables them to get at the HTTP-level status code. Usually do-able, not always straight-forward.
The clients will have to distinguish between HTTP status codes that indicate a communications error and your own status codes that indicate an application-level issue.
The HTTP Status code is NOT part of your api
The HTTP status code will always be 200 if your app received the request and then responded (both success and error cases)
ALL of your responses should include "envelope" or "header" information. Typically something like:
envelope_ver: 1.0
status: # use any codes you like. Reserve a code for success.
msg: "ok" # A human string that reflects the code. Useful for debugging.
data: ... # The data of the response, if any.
This method can be easier for clients since the status for the response is always in the same place (no sub-codes needed), no limits on the codes, no need to fetch the HTTP-level status-code.
Here's a post with a similar idea: http://yuiblog.com/blog/2008/10/15/datatable-260-part-one/
Main issues:
Be sure to include version numbers so you can later change the semantics of the api if needed.
Document...
Remember there are more status codes than those defined in the HTTP/1.1 RFCs, the IANA registry is at http://www.iana.org/assignments/http-status-codes. For the case you mentioned status code 507 sounds right.
As others have pointed, having a response entity in an error code is perfectly allowable.
Do remember that 5xx errors are server-side, aka the client cannot change anything to its request to make the request pass. If the client's quota is exceeded, that's definitly not a server error, so 5xx should be avoided.
I know this is extremely late to the party, but now, in year 2013, we have a few media types to cover error handling in a common distributed (RESTful) fashion. See "vnd.error", application/vnd.error+json (https://github.com/blongden/vnd.error) and "Problem Details for HTTP APIs", application/problem+json (https://datatracker.ietf.org/doc/html/draft-nottingham-http-problem-05).
There are two sorts of errors. Application errors and HTTP errors. The HTTP errors are just to let your AJAX handler know that things went fine and should not be used for anything else.
5xx Server Error
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported
506 Variant Also Negotiates (RFC 2295 )
507 Insufficient Storage (WebDAV) (RFC 4918 )
509 Bandwidth Limit Exceeded (Apache bw/limited extension)
510 Not Extended (RFC 2774 )
2xx Success
200 OK
201 Created
202 Accepted
203 Non-Authoritative Information (since HTTP/1.1)
204 No Content
205 Reset Content
206 Partial Content
207 Multi-Status (WebDAV)
However, how you design your application errors is really up to you. Stack Overflow for example sends out an object with response, data and message properties. The response I believe contains true or false to indicate if the operation was successful (usually for write operations). The data contains the payload (usually for read operations) and the message contains any additional metadata or useful messages (such as error messages when the response is false).
Agreed. The basic philosophy of REST is to use the web infrastructure. The HTTP Status codes are the messaging framework that allows parties to communicate with each other without increasing the HTTP payload. They are already established universal codes conveying the status of response, and therefore, to be truly RESTful, the applications must use this framework to communicate the response status.
Sending an error response in a HTTP 200 envelope is misleading, and forces the client (api consumer) to parse the message, most likely in a non-standard, or proprietary way. This is also not efficient - you will force your clients to parse the HTTP payload every single time to understand the "real" response status. This increases processing, adds latency, and creates an environment for the client to make mistakes.
Modeling your api on existing 'best practices' might be the way to go.
For example, here is how Twitter handles error codes
https://developer.twitter.com/en/docs/basics/response-codes
Please stick to the semantics of protocol. Use 2xx for successful responses and 4xx , 5xx for error responses - be it your business exceptions or other. Had using 2xx for any response been the intended use case in the protocol, they would not have other status codes in the first place.
Don't forget the 5xx errors as well for application errors.
In this case what about 409 (Conflict)? This assumes that the user can fix the problem by deleting stored resources.
Otherwise 507 (not entirely standard) may also work. I wouldn't use 200 unless you use 200 for errors in general.
If the client quota is exceeded it is a server error, avoid 5xx in this instance.