Patterns for handling batch operations in REST web services? - web-services

What proven design patterns exist for batch operations on resources within a REST style web service?
I'm trying to be strike a balance between ideals and reality in terms of performance and stability. We've got an API right now where all operations either retrieve from a list resource (ie: GET /user) or on a single instance (PUT /user/1, DELETE /user/22, etc).
There are some cases where you want to update a single field of a whole set of objects. It seems very wasteful to send the entire representation for each object back and forth to update the one field.
In an RPC style API, you could have a method:
/mail.do?method=markAsRead&messageIds=1,2,3,4... etc.
What's the REST equivalent here? Or is it ok to compromise now and then. Does it ruin the design to add in a few specific operations where it really improves the performance, etc? The client in all cases right now is a Web Browser (javascript application on the client side).

A simple RESTful pattern for batches is to make use of a collection resource. For example, to delete several messages at once.
DELETE /mail?&id=0&id=1&id=2
It's a little more complicated to batch update partial resources, or resource attributes. That is, update each markedAsRead attribute. Basically, instead of treating the attribute as part of each resource, you treat it as a bucket into which to put resources. One example was already posted. I adjusted it a little.
POST /mail?markAsRead=true
POSTDATA: ids=[0,1,2]
Basically, you are updating the list of mail marked as read.
You can also use this for assigning several items to the same category.
POST /mail?category=junk
POSTDATA: ids=[0,1,2]
It's obviously much more complicated to do iTunes-style batch partial updates (e.g., artist+albumTitle but not trackTitle). The bucket analogy starts to break down.
POST /mail?markAsRead=true&category=junk
POSTDATA: ids=[0,1,2]
In the long run, it's much easier to update a single partial resource, or resource attributes. Just make use of a subresource.
POST /mail/0/markAsRead
POSTDATA: true
Alternatively, you could use parameterized resources. This is less common in REST patterns, but is allowed in the URI and HTTP specs. A semicolon divides horizontally related parameters within a resource.
Update several attributes, several resources:
POST /mail/0;1;2/markAsRead;category
POSTDATA: markAsRead=true,category=junk
Update several resources, just one attribute:
POST /mail/0;1;2/markAsRead
POSTDATA: true
Update several attributes, just one resource:
POST /mail/0/markAsRead;category
POSTDATA: markAsRead=true,category=junk
The RESTful creativity abounds.

Not at all -- I think the REST equivalent is (or at least one solution is) almost exactly that -- a specialized interface designed accommodate an operation required by the client.
I'm reminded of a pattern mentioned in Crane and Pascarello's book Ajax in Action (an excellent book, by the way -- highly recommended) in which they illustrate implementing a CommandQueue sort of object whose job it is to queue up requests into batches and then post them to the server periodically.
The object, if I remember correctly, essentially just held an array of "commands" -- e.g., to extend your example, each one a record containing a "markAsRead" command, a "messageId" and maybe a reference to a callback/handler function -- and then according to some schedule, or on some user action, the command object would be serialized and posted to the server, and the client would handle the consequent post-processing.
I don't happen to have the details handy, but it sounds like a command queue of this sort would be one way to handle your problem; it'd reduce the overall chattiness substantially, and it'd abstract the server-side interface in a way you might find more flexible down the road.
Update: Aha! I've found a snip from that very book online, complete with code samples (although I still suggest picking up the actual book!). Have a look here, beginning with section 5.5.3:
This is easy to code but can result in
a lot of very small bits of traffic to
the server, which is inefficient and
potentially confusing. If we want to
control our traffic, we can capture
these updates and queue them locally
and then send them to the server in
batches at our leisure. A simple
update queue implemented in JavaScript
is shown in listing 5.13. [...]
The queue maintains two arrays. queued
is a numerically indexed array, to
which new updates are appended. sent
is an associative array, containing
those updates that have been sent to
the server but that are awaiting a
reply.
Here are two pertinent functions -- one responsible for adding commands to the queue (addCommand), and one responsible for serializing and then sending them to the server (fireRequest):
CommandQueue.prototype.addCommand = function(command)
{
if (this.isCommand(command))
{
this.queue.append(command,true);
}
}
CommandQueue.prototype.fireRequest = function()
{
if (this.queued.length == 0)
{
return;
}
var data="data=";
for (var i = 0; i < this.queued.length; i++)
{
var cmd = this.queued[i];
if (this.isCommand(cmd))
{
data += cmd.toRequestString();
this.sent[cmd.id] = cmd;
// ... and then send the contents of data in a POST request
}
}
}
That ought to get you going. Good luck!

While I think #Alex is along the right path, conceptually I think it should be the reverse of what is suggested.
The URL is in effect "the resources we are targeting" hence:
[GET] mail/1
means get the record from mail with id 1 and
[PATCH] mail/1 data: mail[markAsRead]=true
means patch the mail record with id 1. The querystring is a "filter", filtering the data returned from the URL.
[GET] mail?markAsRead=true
So here we are requesting all the mail already marked as read. So to [PATCH] to this path would be saying "patch the records already marked as true"... which isn't what we are trying to achieve.
So a batch method, following this thinking should be:
[PATCH] mail/?id=1,2,3 <the records we are targeting> data: mail[markAsRead]=true
of course I'm not saying this is true REST (which doesnt permit batch record manipulation), rather it follows the logic already existing and in use by REST.

Your language, "It seems very wasteful...", to me indicates an attempt at premature optimization. Unless it can be shown that sending the entire representation of objects is a major performance hit (we're talking unacceptable to users as > 150ms) then there's no point in attempting to create a new non-standard API behaviour. Remember, the simpler the API the easier it is to use.
For deletes send the following as the server doesn't need to know anything about the state of the object before the delete occurs.
DELETE /emails
POSTDATA: [{id:1},{id:2}]
The next thought is that if an application is running into performance issues regarding the bulk update of objects then consideration into breaking each object up into multiple objects should be given. That way the JSON payload is a fraction of the size.
As an example when sending a response to update the "read" and "archived" statuses of two separate emails you would have to send the following:
PUT /emails
POSTDATA: [
{
id:1,
to:"someone#bratwurst.com",
from:"someguy#frommyville.com",
subject:"Try this recipe!",
text:"1LB Pork Sausage, 1 Onion, 1T Black Pepper, 1t Salt, 1t Mustard Powder",
read:true,
archived:true,
importance:2,
labels:["Someone","Mustard"]
},
{
id:2,
to:"someone#bratwurst.com",
from:"someguy#frommyville.com",
subject:"Try this recipe (With Fix)",
text:"1LB Pork Sausage, 1 Onion, 1T Black Pepper, 1t Salt, 1T Mustard Powder, 1t Garlic Powder",
read:true,
archived:false,
importance:1,
labels:["Someone","Mustard"]
}
]
I would split out the mutable components of the email (read, archived, importance, labels) into a separate object as the others (to, from, subject, text) would never be updated.
PUT /email-statuses
POSTDATA: [
{id:15,read:true,archived:true,importance:2,labels:["Someone","Mustard"]},
{id:27,read:true,archived:false,importance:1,labels:["Someone","Mustard"]}
]
Another approach to take is to leverage the use of a PATCH. To explicitly indicate which properties you are intending to update and that all others should be ignored.
PATCH /emails
POSTDATA: [
{
id:1,
read:true,
archived:true
},
{
id:2,
read:true,
archived:false
}
]
People state that PATCH should be implemented by providing an array of changes containing: action (CRUD), path (URL), and value change. This may be considered a standard implementation but if you look at the entirety of a REST API it is a non-intuitive one-off. Also, the above implementation is how GitHub has implemented PATCH.
To sum it up, it is possible to adhere to RESTful principles with batch actions and still have acceptable performance.

The google drive API has a really interesting system to solve this problem (see here).
What they do is basically grouping different requests in one Content-Type: multipart/mixed request, with each individual complete request separated by some defined delimiter. Headers and query parameter of the batch request are inherited to the individual requests (i.e. Authorization: Bearer some_token) unless they are overridden in the individual request.
Example: (taken from their docs)
Request:
POST https://www.googleapis.com/batch
Accept-Encoding: gzip
User-Agent: Google-HTTP-Java-Client/1.20.0 (gzip)
Content-Type: multipart/mixed; boundary=END_OF_PART
Content-Length: 963
--END_OF_PART
Content-Length: 337
Content-Type: application/http
content-id: 1
content-transfer-encoding: binary
POST https://www.googleapis.com/drive/v3/files/fileId/permissions?fields=id
Authorization: Bearer authorization_token
Content-Length: 70
Content-Type: application/json; charset=UTF-8
{
"emailAddress":"example#appsrocks.com",
"role":"writer",
"type":"user"
}
--END_OF_PART
Content-Length: 353
Content-Type: application/http
content-id: 2
content-transfer-encoding: binary
POST https://www.googleapis.com/drive/v3/files/fileId/permissions?fields=id&sendNotificationEmail=false
Authorization: Bearer authorization_token
Content-Length: 58
Content-Type: application/json; charset=UTF-8
{
"domain":"appsrocks.com",
"role":"reader",
"type":"domain"
}
--END_OF_PART--
Response:
HTTP/1.1 200 OK
Alt-Svc: quic=":443"; p="1"; ma=604800
Server: GSE
Alternate-Protocol: 443:quic,p=1
X-Frame-Options: SAMEORIGIN
Content-Encoding: gzip
X-XSS-Protection: 1; mode=block
Content-Type: multipart/mixed; boundary=batch_6VIxXCQbJoQ_AATxy_GgFUk
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
Date: Fri, 13 Nov 2015 19:28:59 GMT
Cache-Control: private, max-age=0
Vary: X-Origin
Vary: Origin
Expires: Fri, 13 Nov 2015 19:28:59 GMT
--batch_6VIxXCQbJoQ_AATxy_GgFUk
Content-Type: application/http
Content-ID: response-1
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Date: Fri, 13 Nov 2015 19:28:59 GMT
Expires: Fri, 13 Nov 2015 19:28:59 GMT
Cache-Control: private, max-age=0
Content-Length: 35
{
"id": "12218244892818058021i"
}
--batch_6VIxXCQbJoQ_AATxy_GgFUk
Content-Type: application/http
Content-ID: response-2
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Date: Fri, 13 Nov 2015 19:28:59 GMT
Expires: Fri, 13 Nov 2015 19:28:59 GMT
Cache-Control: private, max-age=0
Content-Length: 35
{
"id": "04109509152946699072k"
}
--batch_6VIxXCQbJoQ_AATxy_GgFUk--

From my point of view I think Facebook has the best implementation.
A single HTTP request is made with a batch parameter and one for a token.
In batch a json is sent. which contains a collection of "requests".
Each request has a method property (get / post / put / delete / etc ...), and a relative_url property (uri of the endpoint), additionally the post and put methods allow a "body" property where the fields to be updated are sent .
more info at: Facebook batch API

I would be tempted in an operation like the one in your example to write a range parser.
It's not a lot of bother to make a parser that can read "messageIds=1-3,7-9,11,12-15". It would certainly increase efficiency for blanket operations covering all messages and is more scalable.

Great post. I've been searching for a solution for a few days. I came up with a solution of using passing a query string with a bunch IDs separated by commas, like:
DELETE /my/uri/to/delete?id=1,2,3,4,5
...then passing that to a WHERE IN clause in my SQL. It works great, but wonder what others think of this approach.

Related

How to get message part of email request after using slurp on ring request map?

I am trying to parse an email getting from ring request map.
I get a body object in request map and then I use slurp to read the object. So When I do (slurp (:body req)) I am getting this output:
Date: Fri, 25 Nov 2016 09:06:30 +0630
From: divya.nagar#juspay.in
To: localhost:8080
Cc: cccc
Subject: okay this is subject
Content-Type: multipart/alternative; boundary=T0IDt0S-7950392
--T0IDt0S-7950392
Content-Type: text/plain; charset=UTF-8
kjhdjkdshjk sivya nagar
--T0IDt0S-7950392
Content-Type: text/html; charset=UTF-8
kjhdjkdshjk sivya nagar
--T0IDt0S-7950392--
Now how to get specific details of this message like content or subject only. Any other way to parse it apart from slurp? or I have to do plain string traverse using boundary parameter?
Presumably the request contains more keys than just :body. As you say you are looking for 'content', 'subject' etc. So use:
(keys req)
to find all the keywords that are available in the message. Next time you can use one or more of them rather than :body.
Assuming you found that the keys were :sender and :receiver, and you were interested in them, you could parse the message by deconstructing their values:
(let [{:keys [sender receiver]} req]
(println "sender is" sender))
If the keys apart from :body are not what you want then you will have to look more closely at the value of :body. It is a Jetty input stream with a readLine Java method that you can repeatedly call (via Clojure interop) to read in the contents. The Jetty Web Server is serving up typed Java data objects rather than Clojure data structures, thus you need to read the data using the API.
Usually, ring middleware is used to extract the body into params that are added to the request map. I'm not aware of any such middleware for RFC822 messages, so you might have to roll your own.
You could do the parsing by hand, but I would suggest using a library to do it for you. I imagine most Java email libraries would be capable of taking an RFC822 stream and turning it into some sort of Message object. There's also postal, a Clojure library that wraps the JavaMail API, although I'm not sure it exposes a function for parsing messages.

Specify allowed content types when anwering with HTTP 415

I'm looking into adding (more) precise responses to REST API client (4xx) errors. The direction seems quite clear, as seen here:
406 [sic] when you can't send what they want, 415 when they send what you don't want.
The difference seems to be that you can include allowed methods via the Allow header:
< PUT /api/articles/
> HTTP 405 Method Not Allowed
> Allow: POST
But there isn't any equivalent response header for:
< POST /api/images/
< Content-Type: text/html
> HTTP 415 Unsuported media
The way I see it, I have the following options:
Sending Accept which is exactly for this, but a request-only header
Sending Warning which doesn't seem right at all.
Did I miss something obvious?
This proposal:
https://datatracker.ietf.org/doc/html/draft-wilde-accept-post-02
might be of interest.

Analyzing post request

I'm trying to make post request using c++ Qt.
The target site is http://www.artlebedev.ru/tools/decoder/advanced/
The site looks so:
I inspected it with browser.
And there is one strange thing for me - random number in header.
So, I'm not sure whether I send data for post request correctly.
What have they done it for?
I make my request so (as browser does):
postdata.append("accept:*/*&");
postdata.append("accept-charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3&");
postdata.append("Accept-Encoding:gzip,deflate,sdch&");
postdata.append("Accept-Language:en-US,en;q=0.8&");
postdata.append("Connection:keep-alive&");
postdata.append("Content-Length:36&");
postdata.append("Content-Type:application/x-www-form-urlencoded&");
postdata.append("Cookie:__utma=1.904416008.1352897318.1352905816.1352909441.3; __utmc=1; __utmz=1.1352897318.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __atuvc=7%7C46&");
postdata.append("Host:www.artlebedev.ru&");
postdata.append("Origin:http://www.artlebedev.ru&");
postdata.append("Referer:http://www.artlebedev.ru/tools/decoder/advanced/&");
postdata.append("User-Agent:Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11&");
postdata.append("X-Requested-With:XMLHttpRequest&");
postdata.append("random:0.9632773566991091&"); // I have no idea about this number
postdata.append("Form Dataview URL encoded&");
postdata.append("csin:0&");
postdata.append("csout:0&");
postdata.append("text:fvddas&");
postdata.append("Decode:go");
I get webpage in answer. But that webpage doesn't contain decoded string. Only empty stings. It looks so:
This is my first attempt to make post request, please help me find a way out.
The random value looks like some kind of Cross-site request forgery token to prevent people from doing what you are trying to do, but it is actually not being used. If I re-issue the request using Fiddler without any cookies or the random value, the request still succeeds.
In fact, this request also does:
POST http://www.artlebedev.ru/tools/decoder/advanced/ HTTP/1.1
Host: www.artlebedev.ru
Content-Type: application/x-www-form-urlencoded
Content-Length: 33
csin=0&csout=0&text=foo&Decode=go
So there must be something wrong with your request, and I guess you shouldn't have just copypasted the request from an external viewer into code, but looked at what you are doing:
postdata.append("Form Dataview URL encoded&");
This is not an HTTP header. It's even nice of the server to not respond with a 400 Bad Request. What should be in the place of that line is a single CRLF, to separate the headers from the entity ('request body').
It could prove useful if you output the contents of postdata just before you send it, to look if you can see something wrong.
Perhaps if you like the encoding translation that site can do (or whatever it is it does), you can ask the creators of the site if they have a publicly available API that you can address, or perhaps they'll even share some code or point you towards valuable resources to recreate such a conversion for yourself.

how to use html templates in CouchDB

I've been searching everywhere trying to figure this one out. I'm trying to generate html pages from couchdb show and list functions. I'd like to leverage underscore.js's template solution. The part I'm getting stuck on is how to include html templates in my show and list functions.
Where do I store them? As attachments? And then how do I reference them in my show and list functions. I assume !json and !code macros are not being used, and I can't figure out how to use require() from common js to do it.
Any help would rock!
Thanks!
Extra Info: I'm using Kanso to push my apps, not CouchApp.
CouchDB attachments are, by definition, not accessible in show and list functions.
Show and list functions support CommonJS. So you simply need to include any libraries in the design doc.
{ "_id": "_design/example"
, "say_hi": "module.exports = function(person) { return 'Hello, ' + person }"
, "shows":
{ "hello": "function(doc, req) { var hi = require('say_hi'); return hi(req.query.me) }"
}
}
This view would look like this
GET /my_db/_design/example/_show/hello?me=Jason
HTTP/1.1 200 OK
Server: CouchDB/1.2.0 (Erlang OTP/R15B)
Date: Fri, 06 Apr 2012 11:02:33 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 12
Hello, Jason
I'm unfamiliar with Kanso, but before CouchDB 1.1, view/show etc. functions in CouchDB could not include anything. (The CouchApp tool had its own !include workarounds to solve this.) These are not necessary anymore. CouchDB 1.1 added CommonJS support.
Tll the templates and libraries must be part of the design document. You can access the raw values (as a string) by referencing this.some_key; or load them via CommonJS by executing require("some_key").
For example:
exports.example_view = {
map: function (doc) {
// this must be placed *inside* the map function
var example = require('views/lib/example');
if (doc.num) {
emit(doc._id, example.fn());
}
}
};
(Sharing code between views)
To render templates server-side, you'll need to encode them as string and require them like you require other JavaScript libraries. (For browser-side rendering, fetching attachments via AJAX works.)

Is there a way to force the browser to not cache?

That's pretty much it. The problem I am having depends on if a browser is caching or not. I need to force the browser to not cache.
alreadyExpired
Yesod Haddock Docs
Also, if you have control over the request (like an AJAX call) you can just add a random get param like ?sdasd=klfjlwkfj to be absolutely sure but I think that may be considered poor form.
You can set the following headers:
Cache-Control: no-cache, must-revalidate
Expires: Sat, 26 Jul 1997 05:00:00 GMT
The second can be any date in the past.
Edit: To do this in Yesod, have a look here: http://hackage.haskell.org/packages/archive/yesod-core/0.9.3.3/doc/html/Yesod-Handler.html#g:8