How best to design a RESTful API for initiating an action - web-services

I'm building a RESTful web service that has the usual flavor of CRUD operations for a set of data types. The HTTP verb mappings for these APIs are obvious.
The interesting part comes in where the client can request that a long-running (i.e., hours) operation against one of the data objects be initialized; the status of the operation is reported by querying the data type itself.
For example, assume an object with the following characteristics:
SomeDataType
{
Name: "Some name",
CurrentOperation: "LongOperationA",
CurrentOperationPercent: 0.75,
CurrentOperationEtaSeconds: 3600
}
My question, then, is what the best RESTful approach should be for starting LongOperationA?
The most obvious approach would seem to be making the operation itself the identifier, perhaps something along the lines of POST https://my-web-service.com/api/StartLongOperationA?DataID=xxxx, but that seems a bit clunky, even if I don't specify the data identifier as a query parameter.
It's also pretty trivial to implement this as an idempotent action, so using POST seems like a waste; on the other hand, PUT is awkward, since no data is actually being written to the service.
Has anybody else faced this type of scenario in their services? What have you done to expose an API for initializing actions that honors RESTful principals?
TIA,
-Mark

You could do,
POST /LongRunningOperations?DataId=xxxx
to create a new LongRunningOperation. The URI of the long running operation would be returned in the Location header along with a 201 status code.
Or if you want to keep the long running operations associated to the DataId you could do
POST /Data/xxx/LongRunningOperations
Both these options will give you the opportunity to inquire if there are long running operations still executing. If you need information after the operation has completed you can create things like
GET /CompletedLongRunningOperations
GET /Data/xxx/CompletedLongRunningOperations
GET /Data/xxx/LastCompletedLongRunningOperation

Related

Abort or terminate the Request which takes long time to respond back

I have an application developed using SmartGWT,Jaxrs,ejb &jpa.
I have one scenario where user wants to extract the data(called Search Screen) by entering either firstname,lastname or middlebane,ssn,email,etc
Database contains the huge number of records in millions, which takes lot of time to respond back.
for example, user search with firstname which takes lot of time to respond, in that case user wants to cancel/terminate/abort the request.
Is it possible either in smartgwt or jaxrs(web api) to terminate the request.
So that user can terminate the request and move further
PS:: i tried lot of option,but i didn't get the proper solution.
One solution is to put the business logic in stateful bean and put the bean in the http session ... now you have access to the currently used persistence context and the open transaction so you can call Session.cancelQuery() .... but this method has some limitation .. It works only if Result set is not yet returned , if this limitation harms you, check this answer please
There are other workarounds to synchronize the web client with the business method but this is the one I like most
One more thing you need to consider as this is your use case is to introduce a new lexical search engine like solr or elasticsearch which can be updated frequently with data from the database ... It fits perfectly in lexical search, gives the ability to stand typo mistakes and returns result very quickly

REST API for data processing and method chaining

I apologize in advance if the quality of the question is bad. I am still beginning to learn the concepts of REST API. I am trying to implement a scalable REST API for data processing. Here is what I could think of so far.
Consider some numerical data that can be retrieved using a GET call:
GET http://my.api/data/123/
Users can apply a sequence of arithmetic operations such as add and multiply. A non-RESTful way to do that is:
GET http://my.api/data/123?add=10&multiply=5
Assupmtions:
The original data in the DB is not changed. Only an altered version of it is returned to the user.
The data is large in size (say a large multi-dimensional array), so we can't afford to return the whole data with every opertation call. Instead, we want to apply operations as a batch and return the final modified data in the end.
There are 2 RESTful ways I am currently conisdering:
1. Model arithmetic operations as subresources of data.
If we consider add and multiply as subresources of data as here. In this case, we can use:
GET http://my.api/data/123/add/10/
which would be safe and idempotent, given that the original data is never changed. However, we need to chain multiple operations. Can we do that?
GET http://my.api/data/123/add/10/multiply/5/
Where multiply is creating a subresource of add/10/ which itself is a subresource of data/123
Pros:
Statelessness: The sever doesn't keep any information about the modified data.
Easy access to modified data: It is just a simple GET call.
Cons:
Chaining: I don't know if it can be easily implemented.
Long URIs: with each operation applied, the URI gets longer and longer.
2. Create an editable data object:
In this case, a user creates an editable version of the original data:
POST http://my.api/data/123/
will return
201 Created
Location: http://my.api/data/123/edit/{uniqueid}
Users can then PATCH this editable data
PATCH http://my.api/data/123/edit/{uniqueid}
{add:10, multiply:5}
And finally, GET the edited data
GET http://my.api/data/123/edit/{uniqueid}
Pros:
Clean URIs.
Cons:
The server has to save the state of edited data.
Editing is no long idempotent.
Getting edited data requires users to make at least 3 calls.
Is there a cleaner, more semantic way to implement data processing RESTfully?
Edit:
If you are wondering what is the real world problem behind this, I am dealing with digital signal processing.
As a simple example, you can think of applying visual filters to images. Following this example, a RESTful web service can do:
GET http://my.api/image/123/blur/5px/rotate/90deg/?size=small&format=png
A couple of things worth reviewing in your question.
REST based API’s are resource based
So looking at your first example, trying to chain transformation properties into the URL path following a resource identifier..
GET http://my.api/data/123/add/10/multiply/5/
..does not fit well (as well as being complicated to implement dynamically, as you already guessed)
Statelessness
The idea of statelessness in REST is built around a single HTTP call containing enough information to process the request and provide a result without going back to the client for more information. Storing the result of an HTTP call on the server is not state, it’s cache.
Now, given that a REST based API is probably not the best fit for your usage, if you do still want to use it here are your options:
1. Use the Querystring with a common URL operation
You could use the Querystring but simplify the resource path to accept all transformations upon a single URI. Given your examples and reluctance to store transformed results this is probably your best option.
GET http://my.api/data/123/transform?add=10&multiply=5
2. Use POST non-RESTfully
You could use POST requests, and leverage the HTTP body to send in the transformation parameters. This will ensure that you don’t ever run out of space on the query string if you ever decide to do a lot of processing and it will also keep your communication tidier. This isn’t considered RESTful if the POST returns the image data.
3. Use POST RESTfully
Finally, if you decide that you do want to cache things, your POST can in fact store the transformed object (note that REST doesn’t dictate how this is stored, in memory or DB etc.) which can be re-fetched by Id using a GET.
Option A
POSTing to the URI creates a subordinate resource.
POST http://my.api/data/123
{add:10, multiply:5}
returns
201 Created
Location: http://my.api/data/123/edit/{uniqueid}
then GET the edited data
GET http://my.api/data/123/edit/{uniqueid}
Option B
Remove the resource identifier from the URL to make it clear that you're creating a new item, not changing the existing one. The resulting URL is also at the same level as the original one since it's assumed it's the same type of result.
POST http://my.api/data
{original: 123, add:10, multiply:5}
returns
201 Created
Location: http://my.api/data/{uniqueid}
then GET the edited data
GET http://my.api/data/{uniqueid}
There are multiple ways this can be done. In the end it should be clean, regardless of what label you want to give it (REST non-REST). REST is not a protocol with an RFC, so don't worry too much about whteher you pass information as URL paths or URL params. The underlying webservice should be able to get you the data regarless of how it is passed. For example Java Jersey will give you your params no matter if they are param or URL path, its just an annotation difference.
Going back to your specific problem I think that the resource in this REST type call is not so much the data that is being used to do the numerical operations on but the actual response. In that case, a POST where the data ID and the operations are fields might suffice.
POST http://my.api/operations/
{
"dataId": "123",
"operations": [
{
"type": "add",
"value": 10
},
{
"type": "multiply",
"value": 5
}
]
}
The response would have to point to the location of where the result can be retrieved, as you have pointed out. The result, referenced by the location (and ID) in the response, is essentially an immutable object. So that is in fact the resource being created by the POST, not the data used to calculate that result. Its just a different way of viewing it.
EDIT: In response to your comment about not wanting to store the outcome of the operations, then you can use a callback to transmit the results of the operation to the caller. You can easily add the a field in the JSON input for the host or URL of the callback. If the callback URL is present, then you can POST to that URL with the results of the operation.
{
"dataId": "123",
"operations": [
{
"type": "add",
"value": 10
},
{
"type": "multiply",
"value": 5
}
],
"callBack": "<HOST or URL>"
}
Please don't view this as me answering my own question, but rather as a constribution to the discussion.
I have given a lot of thought into this. The main problem with the currently suggested architectures is scalability, since the server creates copies of data each time it is operated on.
The only way to avoid this is to model operations and data separately. So, similar to Jose's answer, we create a resource:
POST http://my.api/operations/
{add:10, multiply:5}
Note here, I didn't specify the data at all. The created resource represents a series of operations only. The POST returns:
201 Created
Location: http://my.api/operations/{uniqueid}
The next step is to apply the operations on the data:
GET http://my.api/data/123/operations/{uniqueid}
This seprate modeling approach have several advantages:
Data is not replicated each time applies a different set of operations.
Users create only operations resources, and since their size is tiny, we don't have to worry about scalability.
Users create a new resource only when they need a new set of operations.Going to the image example: if I am designing a greyscale website, and I want all images to be converted to greyscale, I can do
POST http://my.api/operations/
{greyscale: "50%"}
And then apply this operation on all my images by:
GET http://my.api/image/{image_id}/operations/{geyscale_id}
As long as I don't want to change the operation set, I can use GET only.
Common operations can be created and stored on the server, so users don't have to create them. For example:
GET http://my.api/image/{image_id}/operations/flip
Where operations/flip is already an available operation set.
Easily, applying the same set of operations to different data, and vice versa.
GET http://my.api/data/{id1},{id2}/operations/{some_operation}
Enables you to compare two datasets that are processed similarly. Alternatively:
GET http://my.api/data/{id1}/operations/{some_operation},{another_operation}
Allows you to see how different processing procedures affects the result.
I wouldn't try to describe your math function using the URI or request body. We have a more or less standard language to describe math, so you could use some kind of template.
GET http://my.api/data/123?transform="5*(data+10)"
POST http://my.api/data/123 {"transform": "5*({data}+10)"}
You need a code on client side, which can build these kind of templates and another code in the server side, which can verify, parse, etc... the templates built by the client.

What are restfull web service method types means?

I am not that new to web service but I am not able to understand use of http methods type in restful web service.
I was referring to vogella tutorial here
http://www.vogella.com/tutorials/REST/article.html#rest_httpmethods
They have following description that I am not able to understand somewhat
GET defines a reading access of the resource without side-effects. The resource is never changed via a GET request, e.g., the request has no side effects (idempotent).
(This is fine. No query here)
PUT creates a new resource. It must also be idempotent.
(Ok seems logical, but it must be idempotent why should I care about it? I am not going to call again this service with same data.)
DELETE removes the resources. The operations are idempotent. They can get repeated without leading to different results.
(Ok same question again why do I want to repeat delete query once it is already deleted?)
POST updates an existing resource or creates a new resource.
(This is fine)
I can do delete and putting data with get and post method also, I know I am not able to understand, but why should I use extra method type delete and put, that are provided in web services and what is exactly use for that ?
The most widely used and "known" HTTP methods are GET and POST. But there are other methods, each of which have different semantics. We need to choose the method which has te most proper meaning to the operation the request is meant to perform, and should not "dumb down" the semantics for those only familiar with GET and POST.
DELETE. The semantics are as follows. Once a DELETE request has been processed for a given resource, that resource can no longer be accessed be clients, no ifs, ands, or buts. Any future request to try and retrieve this resource state's representation with GET or HEAD should result in a 404.
That being said, if the semantics of the operation fit the above description, we must use DELETE. Anything less, such as a "soft" delete or some other state-changing interaction, should not be done with a DELETE, better with POST. So it comes down to the semantics, why to use DELETE.
As far as the question "why do I want to repeat delete query once it is already deleted?", well we don't, but if we were to try to use the exact same DELETE request again, it would have the same effect. That is the meaning of idempotency. It really has no bearing on the why. It is just guaranteed protocol semantics
PUT. It's basically used to update a server resource, or create a user resource. In both cases, the URI is known by the user and is the requested URI. For example
PUT /customers/1234
// some body with name to change
The resource URI is known and the client post a message with a representation to update the resource. If the requested operation meats these requirements, then PUT should be used. In contrast if we are creating a new server resource (customer), then we would use POST
POST /customer
// customer representation
Notice the URI of the new customer is not known, because it has not been created. If the POST id successful, we should be back a Location header with the new URI
HTTP/1.1 201
Location: /customers/12345
That was going a little off on a tangent, but getting back to PUT. PUT is idempotent, because no matter how many times we make the exact above PUT request, the result will be the same. No server state will be affected. On the other hand, if we repeated make the same POST request, more new customer may be created
All that being said, we should do out best to follow the protocol semantics. POST is like a wild card for operations that can't be applied to any other method semantics.
And to answer your repeated question, "why should i care?", as noted about DELETE, idempotency is just a matter of guaranteed protocol semantics, it is not a matter of "but i never plan to do this opertion again", it's a matter of "if someone does perform this exact operation again, there is no effect"
There are mainly four methods are used in restful api.they are:
GET-Used to get some resources
POST-Used to create resource
DELETE-Used to remove resource
PUT-Used to update resource.
As you mentioned above we can use post or get for the purpose of put and delete functions.But in programming we can use this method in place where it indicate its purpose so that others will get an idea of what the code is doing

Create single and multiple resources using restful HTTP

In my API server I have this route defined:
POST /categories
To create one category you do:
POST /categories {"name": "Books"}
I thought that if you want to create multiple categories, then you could do:
POST /categories [{"name": "Books"}, {"name": "Games"}]
I just wanna confirm that this is a good practice for Restful HTTP API.
Or should one have a
POST /bulk
for allowing them to do whatever operations at once (Creating, Reading, Updating and Deleting)?
In true REST, you should probably POST this in multiple separate calls. The reason is that each one will result in a new representation. How would you expect to get that back otherwise.
Each post should return the resultant resource location:
POST -> New Resource Location
POST -> New Resource Location
...
However, if you need a bulk, then create a bulk. Be dogmatic where possible, but if not, pragmatism gets the job done. If you get too hung up on dogmatism, then you never get anything done.
Here is a similar question
Here is one that suggests HTTP Pipelining to make this more efficient
There's nothing particularly wrong with having a bulk operation that you POST to, to activate (it'll be non-idempotent so POST is the right verb) but there are some caveats:
You're making multiple resources, so you need to respond with multiple URLs. This means you can't use the redirect pattern: you'll have to send a list of URLs back in some form.
You have a problem in that bulk operations are often not very discoverable. Discoverability is one of the most important things about RESTfulness, as it means that someone can come along and figure out how to write a client without lots of help from the server author.
Dealing with partial failures when you've got bulk operations remains problematic. It's a problem with any other paradigm too (I've watched people tie themselves in knots over this when working with extensions to SOAP) so it isn't a surprise, but unless you can guarantee that all the creations will work, you're going to have to work out what happens when you make one resource and fail to make the second. (Also, if the bulk request wanted a third one done, would you go on and try that?)
The simplest approach is just to support one create per request; that's a much easier pattern to get right and is better understood all round.
There's nothing wrong with creating multiple resources at once with POST (just don't try it with PUT). It's not "un-REST-ful", especially if you create a representation for the bulk operation itself. I suggest you create an index resource at the same time you create the individual resources, and return a "303 See Other" to it. That index representation would then contain links to all of the created resources (and possibly error information if any of them failed).
POST /categories/uploads/
[{"name": "Books"}, {"name": "Games"}]
303 See Other
Location: /categories/uploads/321/
(actually, now that I think about it, 201 might be better than 303)
GET /categories/uploads/321/
200 OK
Content-Type: application/json
[{"name": "Books", "link": "/categories/Books/"},
{"name": "Games", "error": "The 'Games' category already exists."}]
In your case I would also go the /bulk resource way. But the pattern I would suggest is the following and from my understanding the most natural: Work with the 202 Accepted status code.
The idea of a bulk request is that the server should not be forced to answer immediately as this would mean client needs to wait until it's bulk request completed.
Here is the pattern:
POST /bulk [{"name": "Books"}, {"name": "Games"}]
202 Accepted | Location: /bulk/processing/status/resourceId
GET /bulk/processing/status/resourceId
entry = "REST in peace" | completed | 0 errors | /categories/category/resourceId
entry = "Walking dead" | processing | 0 errors ->
So, the client POSTs the bulk information to the server. The server just accepts them with a 202 which gives no guarantee about the processing state at the time of response.
But the server also provides the link to a status resource. Here the client can have a look on each of the created resources and the processing state. When finished the client can access the resource via the given link.
Error cases can be identified by the client and erroneous data might be resend by a PUT on the completed resource.
Finally, a good advice I am usually following is: Whenever you hit a resource in your design that cannot be mapped on a HTTP feature it is probably because of a missing resource.
Actually this is still a hot topic till today, But simplify things I almost of the time say there is always a batter suited scenario for each practice.
Eg:
1. If you are receiving the likes from a post you don't need the bulk as in case there is only one like per comment.
2. If you are receiving favorites comment the bulk can fit well by considering someone reviewing the comment he reads and check box all of his favorites and send it once.
Again this is based on my experience working with Restful API, and but currently for the sake of multi tasking and others things, me and my colleague we found our selves doing the bulk all the time in most MIS(Management Information System) we do. This is because modern days web app and mobile app that can do a lot of work and send the final results to the back-end, this way the back-end has little job to do as long as the data received don't violate the business logic.

Designing a REST service for media conversion

My current task is to design a REST service that can be used to convert from one media type to another (e.g. from video/x-msvieo to video/x-flv). Its not supposed to be usable vie Browser.
Generally, I'll let clients POST media files and return them some URL for further reference (like http://www.example.com/Media/12345).
Interesting thing is - and that's where questions arise - that the conversion process could be interpreted in two different ways:
1) A converted media is simply a different representation of the original one, so to request a media in a new format, you could just GET http://example.com/Media/12345, and tell the service in the Accept-header what format you need. Since converting for example a big video, the service would respond with a 202 Accepted until conversion has finished. But what should happen, if the conversion fails for any reason?
2) Since conversion takes such a long time, one could represent the process as its own resource. In this case, one would have to POST some form of job description (probably xml) to http://example.com/Media/12345 and the service would respond with a new URI for the requested conversion (like http://example.com/Media/12345/jobs/1). But wouldn't this kind of design be quite non-REST-linke?
What I currently have is this:
1.) POST media file to http://example.com/Media
2.) Response: 201 Created / Location: http://example.com/Media/12345
3.) GET http://example.com/Media/12345
4.) Response: 200 Ok and xml like this:
<media id="123457">
<link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://example.com/Media/12345/video/x-flv">video/x-flv</link>
<link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://example.com/Media/12345/video/mpeg">video/mpeg</link>
</media>
The links in the xml send you to conversion targets available for this media.
5.) Select from the links in the xml to start a conversion / get the result by GETting http://example.com/Media/12345/video/mpeg
6.) Response: 202 Accepted / Location: http://example.com/Media/12345/video/mpeg/Status
7.) Repeat step 5 until conversion is done or have a look at the http://example.com/Media/12345/video/mpeg/Status to see what currently happens.
So, thanks a lot for reading all this stuff :)
What do you think about my approach? What would you do differently?
I am quite new to this stuff, so any suggestions are highly appreciated.
kind regards: Bill
In step 4 I would consider using a 300 response code. You are doing something very similar to client driven content negotiation. See how it's done here http://www.w3.org/TR/wd-xptr
Your idea to create a "job" resource to represent the creation of the new media file is a perfectly valid and very common approach to handling long running operations in RESTful systems.
The only other comment is that in step 5, I was initially concerned about using GET to do that, but having thought about it a bit more it does seem reasonable. It's nice because the the final converted video can be made available at the same URL. Then all the client has to do is be aware of the fact that if they request a video and they get a 202, they just have to wait a bit before retrying. If they want, they can check the ./status resource to know if it done. I guess you just have to make sure if you are already in the process of converting you return another 202 but don't start a new conversion process :-)
Yes, the redirect (presumably) to http://example.com/Media/12345/jobs/1 doesn’t sound very restful. It sounds like you are trying to implement an asynchronous service through a synchronous interface. Couldn’t you POST a ‘conversion request’ resource to kick the process of that returns a session, i.e. a bit like:
Class ConversionRequest
{
Guid sessionid
Int status
…
}
Then use a GET/sessionId to check the status of the conversion? In my experience, if a restful interface starts to feel complex it generally means the resource isn’t right for the task in hand.
You approach seems fine. You can encode any concept in your URIs which obviously includes the jobs concept. It all depends on how you want to design your application interface (resources).
Here is one way I would attack it and it might give you some ideas. (It depends on your clients and application protocol / interface :
/media
GET - List of media + status etc.
POST - Add medias + returns Location: /media/{number}/jobs/{number}
/media/{number}
GET - Shows media status (Valid,In Progress), formats etc. Links to default/current jobs
/media/{number}/jobs
GET - List of jobs
POST - Do extra/special conversion
/media/{number}/formats/{name}
GET - Download
PUT - Start specific conversion, redirects to a job.
/media/{number}/jobs/{number} - Job status etc.
GET - Status etc,
DELETE - Cancel job
Remember that PUT and DELETE is idempotent and POST not.
The way you make use of hypermedia and links looks good. The client should discover the next step or related information via links and not rely on out of band information such as URI structure.