How to properly send 406 status code? - web-services

I'm developing a RESTful API service which initially will only be accepting and responding in JSON format. I want to follow standards and in case of requester's Accept header was different than JSON I want to respond with 406 HTTP status code to inform the requester I cannot output data in other format.
According to W3 I "SHOULD include an entity containing a list of available entity characteristics and location(s) from which the user or user agent can choose the one most appropriate" in my response.
How do I do that, because the above explanation doesn't tell me much. What is the mentioned entity?
Any ideas/suggestions?
EDIT
Initially I thought that maybe could be a comma separated list in Content-Type header but after rethinking maybe I should do the same thing browsers do and use Accept header? This makes much more sense actually, but I cannot find any information to support this.

Three issues here:
First, the note from RFC 2616 is meant to address URI schemes where responses of different types are made available at various URI's, such as "/path/to/thing.xml" vs "/path/to/thing.json". That's not always a popular choice, but if you can do that, do so and include hyperlinks to each one in the "entity"; that is, in the body of the response. Since the RFC doesn't mandate a Content-Type or processing model for such links, you're on your own regarding how to return them, but HTML with <a> tags is common and useful.
If you don't want to expose multiple types at separate URI's, but just want to expose one type at the original URI, then it's perfectly fine to respond with 406 and an entity that simply says which types the resource can emit.
Second, note that most web browsers send */* in the Accept header (with a low quality value), which should match any Content-Type. In addition, the spec says "...if no Accept header field is present, then it is assumed that the client accepts all media types." So the cases where you should be raising 406 are rare.
Third, don't emit a Content-Type response header that is anything other than the Content-Type of the response entity. It should not be used to list acceptable types. You should also not emit a response header named 'Accept'; the 'Accept' header is for requests only; see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1

Related

RESTservice, resource with two different outputs - how would you do it?

Im currently working on a more or less RESTful webservice, a type of content api for my companys articles. We currently have a resource for getting all the content of a specific article
http://api.com/content/articles/{id}
will return a full set of article data of the given article id.
Currently we control alot of the article's business logic becasue we only serve a native-app from the webservice. This means we convert tags, links, images and so on in the body text of the article, into a protocol the native-app can understand. Same with alot of different attributes and data on the article, we will transform and modify its original (web) state into a state that the native-app will understand.
fx. img tags will be converted from a normal <img src="http://source.com"/> into a <img src="inline-image//{imageId}"/> tag, samt goes for anchor tags etc.
Now i have to implement a resource that can return the articles data in a new representation
I'm puzzled over how best to do this.
I could just implement a completely new resource, on a different url like: content/articles/web/{id} and move the old one to content/article/app/{id}
I could also specify in my documentation of the resource, that a client should always specify a specific request header maybe the Accept header for the webservice to determine which representation of the article to return.
I could also just use the original url, and use a url parameter like .../{id}/?version=app or .../{id}/?version=web
What would you guys reckon would be the best option? My personal preference lean towards option 1, simply because i think its easier to understand for clients of the webservice.
Regards, Martin.
EDIT:
I have chosen to go with option 1. Thanks for helping out and giving pros and cons. :)
I would choose #1. If you need to preserve the existing URLS you could add a new one content/articles/{id}/native or content/native-articles/{id}/. Both are REST enough.
Working with paths make content more easily cacheable than both header or param options. Using Content-Type overcomplicates the service especially when both are returning JSON.
Use the HTTP concept of Content Negotiation. Use the Accept header with vendor types.
Get the articles in the native representation:
GET /api.com/content/articles/1234
Accept: application/vnd.com.exmaple.article.native+json
Get the articles in the original representation:
GET /api.com/content/articles/1234
Accept: application/vnd.com.exmaple.article.orig+json
Option 1 and Option 3
Both are perfectly good solutions. I like the way Option 1 looks better, but that is just aesthetics. It doesn't really matter. If you choose one of these options, you should have requests to the old URL redirect to the new location using a 301.
Option 2
This could work as well, but only if the two responses have a different Content-Type. From the description, I couldn't really tell if this was the case. I would not define a custom Content-Type in this case just so you could use Content Negotiation. If the media type is not different, I would not use this option.
Perhaps option 2 - with the header being a Content-Type?
That seems to be the way resources are served in differing formats; e.g. XML, JSON, some custom format

In REST, How to discover acceptable media types?

Given a REST api.
I want to learn what media types I can set in the Accept header.
How should I this?
I know I could do a random
GET http://some.api.com/
Accept:flying/elephants
and hope for a 406 with a body that has the correct acceptable media types.
Is there a better way?
In theory, API could indicate supported Content Types via HTTP OPTIONS
Usually, API offers either
Documentation
Specific resource of supported Accept-header values.
Also (as you might know), Accept-header values are usually bound to IANA defined MIME types
One issue with this is any URI within the API can respond with different media types. It's very common to have different endpoints in the API return different content types.
You could use multiple wildcard requests to probe for support.
You can start with Accept: */* and then application/* text/* */json */xml etc. You would receive a non-exhaustive list, but you'd get the big ones and the the preferred ones.
There's other weird edge cases. For example OData allows you to specify a $format parameter in the URL to define the response type. This overrides the accept header. Thus every format is it's own URI.
It'd be cool if APIs made more use of the alternate link relationship (http://www.w3.org/TR/html5/links.html#rel-alternate), i think that would be the most appropriate. That combined with the type attribute of the link would let you know all the formats for any resource you retrieve. Again it would be specific to each URI though.

More specific rules for an AWS Canonical Request header list?

The AWS documentation here seems to have somewhat confusing, incomplete or contradictory information. It states that
CanonicalHeaders is a list of request headers with their values.
Which suggests that we'd put all request headers in the canonical request. However, later, they state
The CanonicalHeaders list must include the following:
HTTP host header
If the Content-Type header is present in the request, it must be added to the CanonicalHeaders list.
Any x-amz-* headers that you plan to include in your request must also be added. For example, if you are using temporary security credentials, you will include x-amz-security-token in your request. You must add this header in the list of CanonicalHeaders.
OK, the bit about the Content-Type and x-amz headers suggests that we don't actually take all headers, because otherwise they wouldn't need to state that they'd be must be included. So then perhaps, we only need to take the Host header, the Content-Type header, and any x-amz-* headers. But then below, it gets more confusing, because here's an example request:
GET /test.txt HTTP/1.1
Host: examplebucket.s3.amazonaws.com
Date: Fri, 24 May 2013 00:00:00 GMT
Authorization: SignatureToBeCalculated
Range: bytes=0-9
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date: 20130524T000000Z
And here's the example canonical request created from it:
GET
/test.txt
host:examplebucket.s3.amazonaws.com
range:bytes=0-9
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20130524T000000Z
host;range;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
But this is inconsistent with both of the interpretations earlier: if we're supposed to only have Content-Type, Host and x-amz-* headers, then what is the range header doing in the list? And if we're just supposed to take all of the headers, then why isn't the Date header in the list?
Is the list of headers to put in a canonical request then arbitrary, as long as it contains at least the minimum headers? What, exactly, is the definitive set of rules to construct the canonical request headers?
if we're supposed to only have Content-Type, Host and x-amz-* headers,
then what is the range header doing in the list?
You are only required to have Content-Type, Host, and x-amz-*, but you can add other headers that you would like to add to the signature to be validated.
See the note in the docs that says: "For the purpose of calculating a signature, only the host and any x-amz-* headers are required; however, in order to prevent data tampering, you should consider including all the headers in the signature calculation."
And if we're just supposed to take all of the headers, then why isn't
the Date header in the list?
The Date header is special, because it is added by the browser according to client system time that may be incorrect. Because of that, you can use x-amz-date instead.
Is the list of headers to put in a canonical request then arbitrary,
as long as it contains at least the minimum headers?
Yes!
What, exactly, is the definitive set of rules to construct the
canonical request headers?
That would be those defined in the documentation of AWS signature version 4... but you got the idea: you must sign the minimal set of request data and can sign all headers that you'd like.
That said, avoid all this if you can. The SDK for [Javascript, Java, .NET, Python, Ruby, PHP,...] already sign requests for you, manages temporary credentials, credential chains, threading, retries and a lot more. If you can use that, it would probably save a lot of headache.

REST: Mapping application errors to HTTP Status codes

Is it to be considered good practice to reuse RFC HTTP Status codes like this, or should we be making up new ones that map exactly to our specific error reasons?
We're designing a web service API around a couple of legacy applications.
In addition to JSON/XML data structures in the Response Body, we aim to return HTTP Status Codes that make sense to web caches and developers.
But how do you go about mapping different classes of errors onto appropriate HTTP Status codes? Everyone on the team agrees on the following:
GET /package/1234 returns 404 Not Found if 1234 doesn't exist
GET /package/1234/next_checkpoint returns 400 Bad Request if "next_checkpoint" and 1234 are valid to ask for but next_checkpont here doesn't make sense...
and so on... but, in some cases, things needs to be more specific than just "400" - for example:
POST /dispatch/?for_package=1234 returns 412 Precondition Failed if /dispatch and package 1234 both exist, BUT 1234 isn't ready for dispatch just yet.
(Edit: Status codes in HTTP/1.1 and Status codes in WebDAV ext.)
RESTful use of HTTP means that you must keep the API uniform. This means that you cannot add domain specific methods (ala GET_STOCK_QUOTE) but it also means that you cannot add domain specific error codes (ala 499 Product Out Of Stock).
In fact, the HTTP client error codes are a good design check because if you design your resource semantics properly, the HTTP error code meanings will correctly express any errors. If you feel you need additional error codes, your resource design is likely wrong.
Jan
422 Unprocessable Entity is a useful error code for scenarios like this. See this question what http response code for rest service on put method when domain rules invalid for additional information.
GET /package/1234/next_checkpoint
returns 400 Bad Request if
"next_checkpoint" and 1234 are valid
to ask for but next_checkpont here
doesn't make sense...
This is the wrong way to think about that URI.
URIs are opaque, so observing that parts of it are 'valid' and others are not doesn't make any sense from a client perspective. Therefore you should 'just' return a 404 to the client, since the resource "package/1234/next_checkpoint" doesn't exist.
You should use 4xx series responses that best match your request when the client makes a mistake, though be careful to not use ones that are meant for specific headers or conditions. I tend to return a human-readable status message and either a plain-text version of the error as the response body or a structured error message, depending on application context.
Update: Upon further reading of the RFC, "procondition failed" is meant for the conditional headers, such as "if-none-match". I'd give a general 400 message for that instead.
Actually, you shouldn't do this at all. Your use of 404 Not Found is correct, but 400 Bad Request is being used improperly. A 400 Bad Request according to the RFC is used solely when the HTTP protocol is malformed. In your case, the request is syntactically correct, it is just an unexpected argument. You should return a 500 Server Error and then include an error code in your REST result.

Best way to decide on XML or HTML response?

I have a resource at a URL that both humans and machines should be able to read:
http://example.com/foo-collection/foo001
What is the best way to distinguish between human browsers and machines, and return either HTML or a domain-specific XML response?
(1) The Accept type field in the request?
(2) An additional bit of URL? eg:
http://example.com/foo-collection/foo001 -> returns HTML
http://example.com/foo-collection/foo001?xml -> returns, er, XML
I do not wish to oblige machines reading the resource to parse HTML (or XHTML for that matter). Machines like the googlebot should receive the HTML response.
It is reasonable to assume I control the machine readers.
If this is under your control, rather than adding a query parameter why not add a file extension:
http://example.com/foo-collection/foo001.html - return HTML
http://example.com/foo-collection/foo001.xml - return XML
Apart from anything else, that means if someone fetches it with wget or saves it from their browser, it'll have an appropriate filename without any fuss.
My preference is to make it a first-class part of the URI. This is debatable, since there are -- in a sense -- multiple URI's for the same resource. And is "format" really part of the URI?
http://example.com/foo-collection/html/foo001
http://example.com/foo-collection/xml/foo001
These are very easy deal with in a web framework that has URI parsing to direct the request to the proper application.
If this is indeed the same resource with two different representations, the HTTP invites you to use the Accept-header as you suggest. This is probably a very reliable way to distinguish between the two different scenarios. You can be plenty sure that user agents (including search engine spiders) send the Accept-header properly.
About the machine agents you are going to give XML; are they under your control? In that case you can be doubly sure that Accept will work. If they do not set this header properly, you can give XML as default. User agents DO set the header properly.
I would try to use the Accept heder for this, because this is exactly what the Accept header is there for.
The problem with having two different URLs is that is is not automatically apparent that these two represent the same underlying resource. This can be bad if a user finds an URL in one program, which renders HTML, and pastes it in the other, which needs XML. At this point a smart user could probably change the URL appropriately, but this is just a source of error that you don't need.
I would say adding a Query String parameter is your best bet. The only way to automatically detect whether your client is a browser(human) or application would be to read the User-Agent string from the HTTP Request. But this is easily set by any application to mimic a browser, you're not guaranteed that this is going to work.