Resource processing and process monitoring in RESTful Web Services - web-services

Consider a RESTful Web Service processing large documents on the server side. It could be a document converter accepting multi-paged or single-paged digital images and converting them to PDF. The user has the possibility to compose the final PDF from several images by inserting them into the virtual document via REST. This means that API users will make several requests before the conversion can be started.
Now my question:
I need to signal the Web Service to start document processing. Because such a processing can take some time (considering a video converter, for example), some kind of monitoring is required, in order to be able to display progress information in the front-end.
How is this done in the modern RESTful Web Services? Or, in other words, is it possible to implement this nicely in the RESTful world (i.e. without resorting to some sort of RPC)?
I'd appreciate real examples and useful links.

202 Accepted
The 202 (Accepted) status code indicates that the request has been accepted for processing, but the processing has not been completed.
The representation sent with this response ought to describe the request's current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled.
In short, there is a "report on the progress of this instance of the process" resource, which the client can monitor.

Related

Architecture Design for API of Cloud Service

Background:
I've a local application that process the user input for 3 second (approximately) and then return an answer (output) to the user.
(I don't want to go into details about my application in purpose of not complicate the question and keep it a pure architectural question)
My Goal:
I want to make my application a service in the cloud and expose API
(for the upcoming website and for clients that will connect the service without install the software locally)
Possible Solutions:
Deploy WCF on the cloud and use my application there, so clients can invoke the service and use my application on the cloud. (RPC style)
Use a Web-API that will insert the request into queue and then a worker role will dequeue requests and post the results to a DB, so the client will send one request for creating a request in the queue, and another request for getting the result (which the Web-API will get from the DB).
The Problems:
If I go with the WCF solution (#1) I cant handle great loads of requests, maybe 10-20 simultaneously.
If I go with the WebAPI-Queue-WorkerRole solution (#2) sometimes the client will need to request the results multiple times its can be a problem.
If I go with the WebAPI-Queue-WorkerRole solution (#2) the process isn't sync, the client will not get the result once the process of his request is done, he need to request the result.
Questions:
In the WebAPI-Queue-WorkerRole solution (#2), can I somehow alert the client once his request has processed and done ? so I can save the client multiple request (for the result).
Asking multiple times for the result isn't old stuff ? I remmemeber that 10 - 15 years ago its was accepted but now ? I know that VirusTotal API use this kind of design.
There is a better solution ? one that will handle great loads and will be sync or async (returning result to the client once it done) ?
Thank you.
If you're using Azure, why not simply fire up more servers and use load balancing to handle more load? In that way, as your load increases, you have more servers to handle the requests.
Microsoft recently made available the Azure Service Fabric, which gives you a lot of control over spinning up and shutting down these services.

Java EE 6 server push

Background
I'm well into building a central appointment booking system for various service providers. This is being built on Java EE 6 on GlassFish 3.1.1, making use of JSF 2, EJB 3.1 and other JEE6 standards. Each service provider has their own appointment booking facility running on their own server that is connected to the Internet, but currently they don't have any interface to allow their patients to make their own appointments.
My system will synchronously book appointments on service providers' systems (in response to user requests) via an API I have defined which involves streaming XML over HTTP. These appointment booking requests are synchronous because service providers will continue to book appointments directly in their system the old fashioned way (over the phone/counter) and, for whatever reasons, their system reserves the right to reject such requests (to prevent double bookings and for other reasons). So, their systems retain the status of being the source of truth.
For obvious reasons (mainly security), API connections are established by the service providers' systems. This means requests are being sent by the server (my system) and responses by the client (their system).
Problem
I need some suggestions regarding how I can build a server-push XML over HTTP API using Java EE 6 on GlassFish 3.1.1. A number of less than ideal options come to mind. One of them involves a singleton bean that contains a map of my appointment IDs to their appointment booking responses. In this scenario, my system polls the map for a limited time (up to 10 seconds, for example) until it finds a matching response, then returns the response which is then handled eventually in the JSF UI. Meanwhile, the API servlet (or perhaps JAX-RS web service) polls the singleton bean for requests, converts them to XML and streams them to the service provider's output stream.
I'm sure there must be a number of better ways of doing this not involving thread-per-connection, blocking, polling, etc.
Ideas?
Update
I was leaning towards Atmosphere/Jersey, but now I realize that blocking I/O is quite scalable under NPTL, so I'm flexible in that regard.
Here's how I did it:
Start with a singleton EJB containing:
A Map<Long, BlockingDeque<OutboundApiMessage>>, where the key is the API client ID;
A Map<Long, Exchanger<AppointmentExchange>>, where the key is the web-side appointment ID and AppointmentExchange contains that ID and the API client side appointment ID.
I set bean managed concurrency control on the singleton EJB, defined some methods to enqueue, dequeue and requeue (the latter in case server push of the OutboundApiMessage failed and needed to be added to the front of the queue where it would be the next item to be dequeued), wired it to the JAX-RS web services (one for upstream, one for downstream).
The singleton EJB has a method to synchronously book an appointment. It enqueues a message to be picked up by the client, creates an Exchanger containing a new AppointmentExchange instance then blocks waiting for the exchange. When an answer comes back on the inbound web service, that web service notifies the singleton EJB via another method which performs the exchange via the Exchanger.
It all works quite well now. Of course, there is a bit more to it than that, but that's the gist of it.

How to redirect a web service?

I have a web service which performs the submission of a small amount of data. It provides a synchronous request response service for my clients. This is working well. I have a new requirement to also support the submission of a much larger amount of the same data; about 10,000 times more data volume. Naturally the larger data will be an asynchronous service for my clients.
The infrastructure I use for the small amount of data cannot support both types of service; the large volume submissions will kill the responsiveness of my small volume submissions.
What I would like to do is be flexible with my deployment and make life simple for the people developing the client software which submits the data. I have been looking for a standards based way to do this:
- client calls my data submission web service
- server determines the amount of data being submitted
- if data is too big the server responds to the client with a different uri. The uri is for client to do the submission i.e. Redirect the client to bigger infrastructure
- client calls the different uri and gets service
I've done some searching and the general response is that this isn't something that is done in web services. I don't understand why. This seems like a reasonable requirement that is probably also true for clustered server scenario's.
Does anyone know if there are standards which cover this? If not, is there a better way?
A subtlety in my case is that I want all the traffic to flow differently for the large submission so I can't simply front end my infrastructure with a web service content aware proxy server. I need to push the web service call to a totally different place; much like a HTTP redirect.
Any help is appreciated.

Ideal way/architecture to deliver large data over Web Services

We are trying to design 6 web services, which will serve another client component. The client component requires data from the web service we are implementing.
Now, the problem is, there is not 1 Web Service we are implementing, there is one Web Service which the client component hits, this initiates a series (5 more) of Web Services which gather data from their respective data stores and finally provide the data back to the original Web Service, which then delivers the data back to the client component.
So, if the requested data becomes huge, then, this will be a serious problem for our internal communication channel.
So, what do you guys suggest? What can be done to avoid overloading of the communication channel between the internal Web Service and at the same time, also delivering the data to the client component.
Update 1
Using 5 WS, where, 1WS does not know about the others, except the next one is a business requirement. Actually, 5 companies "small services" are being integrated.
We use Java and Axis2
We've had a similar problem. Apart from trying to avoid it (eg for internal communication go direct to db instead of web service) you can mitigate it by at least not performing the 5 or so tasks in series. Make new threads to collect them all in parallel and process them at the end to reduce latency (except where they might contend for the same resource and bottle neck).
But before I'd do anything load test it and see if it is even an issue and get some baseline stats so you can see what improvement each change makes. Also sometimes you might be better off tweaking network settings or the actual network rather than trying to optimise the code - but again test and see.
Put all the data on a temporary compressed file and give back the ftp url of the file.
The client fetches the big data chunk uncompress it and reads it. (maybe some authentication mechanism for the ftp server)

Message Queue vs. Web Services? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Under what conditions would one favor apps talking via a message queue instead of via web services (I just mean XML or JSON or YAML or whatever over HTTP here, not any particular type)?
I have to talk between two apps on a local network. One will be a web app and have to request commands on another app (running on different hardware). The requests are things like creating users, moving files around, and creating directories. Under what conditions would I prefer XML Web Services (or straight TCP or something) to using a Message queue?
The web app is Ruby on Rails, but I think the question is broader than that.
When you use a web service you have a client and a server:
If the server fails the client must take responsibility to handle the error.
When the server is working again the client is responsible of resending it.
If the server gives a response to the call and the client fails the operation is lost.
You don't have contention, that is: if million of clients call a web service on one server in a second, most probably your server will go down.
You can expect an immediate response from the server, but you can handle asynchronous calls too.
When you use a message queue like RabbitMQ, Beanstalkd, ActiveMQ, IBM MQ Series, Tuxedo you expect different and more fault tolerant results:
If the server fails, the queue persist the message (optionally, even if the machine shutdown).
When the server is working again, it receives the pending message.
If the server gives a response to the call and the client fails, if the client didn't acknowledge the response the message is persisted.
You have contention, you can decide how many requests are handled by the server (call it worker instead).
You don't expect an immediate synchronous response, but you can implement/simulate synchronous calls.
Message Queues has a lot more features but this is some rule of thumb to decide if you want to handle error conditions yourself or leave them to the message queue.
There's been a fair amount of recent research in considering how REST HTTP calls could replace the message queue concept.
If you introduce the concept of a process and a task as a resource, the need for middle messaging layer starts to evaporate.
Ex:
POST /task/name
- Returns a 202 accepted status immediately
- Returns a resource url for the created task: /task/name/X
- Returns a resource url for the started process: /process/Y
GET /process/Y
- Returns status of ongoing process
A task can have multiple steps for initialization, and a process can return status when polled or POST to a callback URL when complete.
This is dead simple, and becomes quite powerful when you realize that you can now subscribe to an rss/atom feed of all running processes and tasks without any middle layer. Any queuing system is going to require some sort of web front end anyway, and this concept has it built in without another layer of custom code.
Your resources exist until you delete them, which means you can view historical information long after the process and task complete.
You have built in service discovery, even for a task that has multiple steps, without any extra complicated protocols.
GET /task/name
- returns form with required fields
POST (URL provided form's "action" attribute)
Your service discovery is an HTML form - a universal and human readable format.
The entire flow can be used programmatically or by a human, using universally accepted tools. It's a client driven, and therefore RESTful. Every tool created for the web can drive your business processes. You still have alternate message channels by POSTing asynchronously to a separate array of log servers.
After you consider it for a while, you sit back and start to realize that REST may just eliminate the need for a messaging queue and an ESB altogether.
http://www.infoq.com/presentations/BPM-with-REST
Message queues are ideal for requests which may take a long time to process. Requests are queued and can be processed offline without blocking the client. If the client needs to be notified of completion, you can provide a way for the client to periodically check the status of the request.
Message queues also allow you to scale better across time. It improves your ability to handle bursts of heavy activity, because the actual processing can be distributed across time.
Note that message queues and web services are orthogonal concepts, i.e. they are not mutually exclusive. E.g. you can have a XML based web service which acts as an interface to a message queue. I think the distinction your looking for is Message Queues versus Request/Response, the latter is when the request is processed synchronously.
Message queues are asynchronous and can retry a number of times if delivery fails. Use a message queue if the requester doesn't need to wait for a response.
The phrase "web services" make me think of synchronous calls to a distributed component over HTTP. Use web services if the requester needs a response back.
I think in general, you'd want a web service for a blocking task (this tasks needs to be completed before we execute more code), and a message queue for a non-blocking task (could take quite a while, but we don't need to wait for it).