How do you integrate applications via web services and deal with technical errors like connectivity errors for web service calls which change state?
E.g. when the network connection gets interrupted during a web service call, how does the client know whether the web services has processed its action or not?
Can this issue be solved at the business layer only (e.g. to query a previous call state) or are you aware of some nice frameworks/best practices which can help wrapping transactional guarantees around a web service?
Implementing it all by yourself with some kind of transactional context tracked in the business layer is always an option. You can use some compensation mechanisms to ensure transactions are rolled back if needed, but you'll need to:
have the information on transactions persisted somewhere
use transaction correlation IDs, so you can query when the response has
been lost (having correlation IDs is good idea anyway)
implement the operations needed to read/write/rollback, etc, so it might make your services a bit more complex
Another option I can think of is If you're using SOAP you can go for asynchronous communication and look for some stack implementing WS-Coordination, WS-AtomicTransaction and WS-BusinessActivity specifications, then decide for yourself if it is a good idea in your context or not. For example, I think Axis2 supports these, but of course eventually it depends on technologies and stack you use.
From the article above:
WS-AtomicTransaction defines a coordination type that is most useful
for handling system-generated exceptions, such as an incomplete write
operation or a process terminating abnormally.
Below are the types of 2-Phase Commit that it implements.
Hope this helps!
Related
I am doing some research on SOAP, for a personal project, and I came across a website with a list of pros and cons for using SOAP, and I understood what most of them meant, except for this one under disadvantages:
SOAP is typically limited to pooling, and not event notifications, when leveraging HTTP for transport. What's more, only one client can use the services of one server in typical situations.
From my understanding of pooling, there should be no issue pooling a SOAP Object for re usability. Pooling is simply a way to use the same resources over and over again, like a connection to a database. Also not entirely certain on the context of Event Notifications.
So my two questions here are, what does the above block quoted text actually mean, and is this information correct?
Website: http://searchsoa.techtarget.com/definition/SOAP
SOAP is RPC, and in RPC some local client invokes a method on some remote target and receives a result. That's how it works, so SOAP works that way too. A client invokes a service asking for something and the service just responds.
If you want "events" in this type of communication the most simple approach is to invoke the service more often (i.e. polling). This has the advantage that nothing changes for the server or the client. It's the same RPC call but done more frequently.
These days everyone is connected to the web and everyone is subscribed to all sorts of services. They want to get notified as soon as something happens to the world around them. Pooling becomes inefficient in this sea of users and services because you are wasting resources. You might poll a service a hundred times just to get back one notification. For this reason technology is evolving so that resource use is minimized. And the direction this is moving to is push services.
Now almost everything happens in the browser. Every browser manufacturer rushes to implement the latest technology changes and HTML5 spec. This means actual pages that push notifications to users instead of faking it with Ajax, comet, etc.
SOAP has been around since 1998 and it's not moving as fast as the rest of the web, mainly because SOAP is mostly an enterprise player and because it's a protocol. Because it's a protocol you have to make new technology available to it without breaking that protocol. Things move slower so people have abandoned SOAP in favor of other ways of doing server-client communication.
SOAP is typically limited to pooling, and not event notifications...
That is correct. But be aware that "typically" does not mean "always".
You can have events, but it's harder. It involves using WS-* specifications like WS-Eventing and WS-Addressing. This is a change in the way SOAP clients operate because a client now becomes some sort of a service too because it needs to receive calls too, not just initiate them. If your technology stack implements these specifications then good for you, but if it doesn't, then you have to build it yourself and it's a real pain.
So for these reasons, if you don't have blocking performance or resource usage issues, you "typically" chose doing polling with SOAP and not event notifications.
I have heard enough about RESTful service should be stateless. All state info should be stored in client. And each request should contain all the necessary state info.
But why? What's the benefit of doing that? Only when I know its benefit/motivation can I use it properly.
What if my client have a huge amount of state? Suppose there's an online document editing application. Does client have to send the full text he/she is editing when calling server's RESTful API? Or is this scenario simply not suitable for RESTful approach?
When talking about REST (or well RESTful since not many people adhere 100% to the paper I will quote here) services I always think it's best to start with the source, meaning Fielding dissertation which mentions in 5.1.3 Stateless:
This constraint induces the properties of visibility, reliability, and scalability. Visibility
is improved because a monitoring system does not have to look beyond a single request
datum in order to determine the full nature of the request. Reliability is improved because
it eases the task of recovering from partial failures [133]. Scalability is improved because
not having to store state between requests allows the server component to quickly free
resources, and further simplifies implementation because the server doesn’t have to
manage resource usage across requests.
It goes even further talking about its trade-offs:
Like most architectural choices, the stateless constraint reflects a design trade-off. The
disadvantage is that it may decrease network performance by increasing the repetitive data
(per-interaction overhead) sent in a series of requests, since that data cannot be left on the
server in a shared context. In addition, placing the application state on the client-side
reduces the server’s control over consistent application behavior, since the application
becomes dependent on the correct implementation of semantics across multiple client
versions.
But Fielding doesn't stop even there, he talks about caching to overcome some of the problems.
I highly recommend you go through that PDF, since (from what I remember) that was the original paper that introduced REST.
The use case you provided is a tough one and as many said it depends on your exact scenario. RESTful services are called restFUL and not REST because people found the original paper too limiting and decided to loosen up a bit the rules (for instance the original paper doesn't say anything about batch operations).
The primary benefit is scalability -- by not needing to fetch additional context for each request, you minimize the amount of work done by the server, which may need to service many requests at the same time.
Additionally, it helps provide greater clarity to consumers of your API. By having the user send everything related to the operation being done, they can more clearly see what is actually being done, and the error messages they get can often be more direct as a result; an error can say what value is wrong and why, rather than trying to communicate that something the consumer can't see went wrong on the server.
From the same chapter of Fielding's dissertation:
Like most architectural choices, the stateless constraint reflects a
design trade-off. The disadvantage is that it may decrease network
performance by increasing the repetitive data (per-interaction
overhead) sent in a series of requests, since that data cannot be left
on the server in a shared context.
Advantages are explained as follows:
This constraint induces the properties of visibility, reliability, and
scalability.
Visibility is improved because a monitoring system does
not have to look beyond a single request datum in order to determine
the full nature of the request.
Reliability is improved because it
eases the task of recovering from partial failures [133].
Scalability is improved because not having to store state between requests allows
the server component to quickly free resources, and further simplifies
implementation because the server doesn't have to manage resource
usage across requests.
Regarding your specific case, yes and no. This is how the Web works. When we edit something online, we send entire request to the server. Though it is a design choice how we implement partial updates.
Software can be designed to accomplish this goal by sending PUT/POST requests to sub-resources. For example:
PUT /book/chapter1 HTTP/1.1
PUT /book/chapter2 HTTP/1.1
PUT /book/chapter3 HTTP/1.1
instead of updating whole resource:
PUT /book HTTP/1.1
Content-Type: text/xyz
Content-Length: ...
Lets say I have web applicatons/services:
API
Set of Applications
API is used for managing some resources (simple CRUD operations). Now what I need is to subscribe Applications for changes of different API resources. Applications would do some background work on a change.
I came up to idea of callbacks. So that Applications can oauthorise and post to the API a callback config.
I think that this config should look like this:
{
'callback_url': 'http://3rdpartyservice.com/callback',
'resources': ['foo1', 'foo2'],
'ref_data': { 'token': 'abcd1234' }
}
resources is array of the resources that 3rd party service is interested in
ref_data is custom json for 3rd party usage (e.g. for auth)
This way on specified resource change the API would send a request to callback_url. This request would contain resource data, action(create/update/delete) and ref_data.
The intention here is to make this generic enough to allow 3rd party clients configure such callbacks.
So the question are:
Are there any best practices?
What about security potential issues?
Are there any real world examples on the web?
Tx
Sounds very similar as WebHooks or Service Hooks.
Check out the Web Hooks on GitHub, to get a good idea what they are and how they work. See also last alinea Service Hooks, as it explains how github handles these WebHooks. This would be similar for your application. The OAuth explains why and how it is done.
See also Webhooks, REST and the Open Web, from API User Experience.
There is even RestHooks.
The general solution to this requirement is usually called "publish/subscribe". There are dozens of solutions to this - google "publish subscribe REST" for some examples. You can also read "Enterprise Integration Patterns".
They key challenge in this kind of solution is "real-time versus queue".
For instance, if you have an API with a million clients, who are all interested in the same event, you cannot guarantee that in real time you can reach all of those clients within whatever timeframe their application demands. You also have to worry about the network going away, or clients being temporarily down. In this case, you application might define an event queue, and clients look in that queue for events they're interested in. Once you go down that route, you're probably going to use some off-the-shelf software rather than building your own. Apache Camel is a good open source implementation.
In your example, for instance, what happens if you cannot reach 3rdpartyservice.com? Or if http://3rdpartyservice.com/callback throws an error when posting an update to foo1, but not to foo2? Or if http://3rdpartyservice.com/ uses a different flavour of OAuth than you're used to? How do you guarantee http://3rdpartyservice.com/ that it's you who is posting an update, not a hacker?
Your choices really tend to come down to your non-functional requirements, rather than the functional ones - things like uptime, guarantee of notification, guarantee of delivery, etc. are more important than the specifics of how you pass across the parameters, and whether it's "resource-based" or some other protocol.
I've been tasked with creating an intermediate layer which needs to exchange data (over HTTP) between two independent systems (e.g. Receiver <=> Intermediate Layer (IL) <=> Sender). Receiver and Sender both expose a set of API's via Web Services. Everytime a transaction occurs in the Sender system, the IL should know about it (I'm thinking of creating a Windows Service which constantly pings the Sender), massage the data, then deliver it to the Receiver. The IL can temporarily store the data in a SQL database until it is transferred to the Receiver. I have the following questions -
Can WCF (haven't used it a lot) be used to talk to the Sender and Receiver (both expose web services)?
How do I ensure guaranteed delivery?
How do I ensure security of the messages over the Internet?
What are best practices for handling concurrency issues?
What are best practices for error handling?
How do I ensure reliability of the data (data is not tampered along the way)
How do I ensure the receipt of the data back to the Sender?
What are the constraints that I need to be aware of?
I need to implement this on MS platform using a custom .NET solution. I was told not to use any middleware like BizTalk. The receiver is an SDFC instance, if that matters.
Any pointers are greatly appreciated. Thank you.
A Windows Service that orchestras the exchange sounds fine.
Yes WCF can deal with traditional Web Services.
How do I ensure guaranteed delivery?
To ensure delivery you can use TransactionScope to handle the passing of data between the
Receiver <=> Intermediate Layer and Intermediate Layer <=> Sender but I wouldn't try and do them together.
You might want to consider some sort of queuing mechanism to send the data to the receiver; I guess I'm thinking more of a logical queue rather than an actual queuing component. A workflow framework could also be an option.
make sure you have good logging / auditing in place; make sure it's rock solid, has the right information and is easy to read. Assuming you write a service it will execute without supervision so the operational / support aspects are more demanding.
Think about scenarios:
How do you manage failed deliveries?
What happens if the reciever (or sender) is unavailbale for periods of time (and how long is that?); for example: do you need to "escalate" to an operator via email?
How do I ensure security of the messages over the Internet?
HTTPS. Assuming other existing clients make calls to the Web Services how do they ensure security? (I'm thinking encryption).
What are best practices for handling concurrency issues?
Hmm probably a separate question. You should be able to find information on that easily enough. How much data are we taking? what sort of frequency? How many instances of the Windows Service were you thinking of having - if one is enough why would concurrency be an issue?
What are best practices for error handling?
Same as for concurrency, but I can offer some pointers:
Use an established logging framework, I quite like MS EntLibs but there are others (re-using whatever's currently used is probably going to make more sense - if there is anything).
Remember that execution is unattended so ensure information is complete, clear and unambiguous. I'd be tempted to log more and dial it down once a level of comfort is reached.
use a top level handler to ensure nothing get's lost; but don;t be afraid to log deep in the application where you can still get useful context (like the metadata of the data being sent / recieved).
How do I ensure the receipt of the data back to the Sender?
Include it (sending the receipt) as a step that is part of the transaction.
On a different angle - have a look on CodePlex for ESB type libraries, you might find something useful: http://www.codeplex.com/site/search?query=ESB&ac=8
For example ESBasic which seems to be a class library which you could reuse.
Being a beginner, how should I go about deciding if a particular process has to implemented as ESB or as BPEL ?
What are the various parameters that one should use for deciding if either should be used for implementation?
First of all ESB is just a concept while BPEL is an OASIS standard based on XML and Web Services. A BPEL file is actually XML.
You use an ESB when you need to connect 2 or more applications together, to avoid direct point-to-point integration. This offers various benefits, such as translating messages from one format to another, or introducing other message exchange patterns. An ESB's communication is typically stateless, i.e. a message goes through, gets routed to its destination(s), and it ends there. An ESB is a very broad term, interpreted and misinterpreted by vendors to market their products.
A Business Process Management system implementing BPEL and similar technologies on the other hand are concerned with keeping track of the progress of various activities and their relationship. A BPEL process is very similar to a flow chart. A BPEL process preserves state, keeps track of its progress and flow, and is typically used (although not necessarily) in long-winded transactions which could also involve manual human tasks.
A textbook example of a BPEL process is a loan processing application. A request for a customer loan comes in, and the process first performs some automated checks using web service calls on some systems and if the credit rating is too low, the system informs a manager to evaluate the form manually (via some workflow system). The process then waits for a callback from the human workflow system, uses some correlation method (some ID) to match it with the right BPEL process instance (so that the right customer is serviced), and resumes the process accordingly.
ESB from my experience are always for processes that do not include a wait state. When you are just going through a list of services and will get to point a to point b without any pause states, I would use an ESB. ESBs also can handle higher quantities of message requests.
Any time human interaction is involved(Entering values, Review submission), I lean towards implementing this in a BPM. These tend to have more robust handling of long periods of waiting.
There are several questions you need to ask yourself when making the choice between ESB and BPEL. Among the most important:
- am I dealing with a stateless process (then I choose ESB) or a stateful one (so I choose BPEL)
- do I need to handle a large volume of short messages - in this case I choose ESB
- do I need orchestration of business processes - then I use BPEL
Here you have a good resource for your question:
http://www.ibm.com/developerworks/websphere/library/techarticles/0803_fasbinder2/0803_fasbinder2.html