Designing an architecture for exchanging data between two systems - web-services

I've been tasked with creating an intermediate layer which needs to exchange data (over HTTP) between two independent systems (e.g. Receiver <=> Intermediate Layer (IL) <=> Sender). Receiver and Sender both expose a set of API's via Web Services. Everytime a transaction occurs in the Sender system, the IL should know about it (I'm thinking of creating a Windows Service which constantly pings the Sender), massage the data, then deliver it to the Receiver. The IL can temporarily store the data in a SQL database until it is transferred to the Receiver. I have the following questions -
Can WCF (haven't used it a lot) be used to talk to the Sender and Receiver (both expose web services)?
How do I ensure guaranteed delivery?
How do I ensure security of the messages over the Internet?
What are best practices for handling concurrency issues?
What are best practices for error handling?
How do I ensure reliability of the data (data is not tampered along the way)
How do I ensure the receipt of the data back to the Sender?
What are the constraints that I need to be aware of?
I need to implement this on MS platform using a custom .NET solution. I was told not to use any middleware like BizTalk. The receiver is an SDFC instance, if that matters.
Any pointers are greatly appreciated. Thank you.

A Windows Service that orchestras the exchange sounds fine.
Yes WCF can deal with traditional Web Services.
How do I ensure guaranteed delivery?
To ensure delivery you can use TransactionScope to handle the passing of data between the
Receiver <=> Intermediate Layer and Intermediate Layer <=> Sender but I wouldn't try and do them together.
You might want to consider some sort of queuing mechanism to send the data to the receiver; I guess I'm thinking more of a logical queue rather than an actual queuing component. A workflow framework could also be an option.
make sure you have good logging / auditing in place; make sure it's rock solid, has the right information and is easy to read. Assuming you write a service it will execute without supervision so the operational / support aspects are more demanding.
Think about scenarios:
How do you manage failed deliveries?
What happens if the reciever (or sender) is unavailbale for periods of time (and how long is that?); for example: do you need to "escalate" to an operator via email?
How do I ensure security of the messages over the Internet?
HTTPS. Assuming other existing clients make calls to the Web Services how do they ensure security? (I'm thinking encryption).
What are best practices for handling concurrency issues?
Hmm probably a separate question. You should be able to find information on that easily enough. How much data are we taking? what sort of frequency? How many instances of the Windows Service were you thinking of having - if one is enough why would concurrency be an issue?
What are best practices for error handling?
Same as for concurrency, but I can offer some pointers:
Use an established logging framework, I quite like MS EntLibs but there are others (re-using whatever's currently used is probably going to make more sense - if there is anything).
Remember that execution is unattended so ensure information is complete, clear and unambiguous. I'd be tempted to log more and dial it down once a level of comfort is reached.
use a top level handler to ensure nothing get's lost; but don;t be afraid to log deep in the application where you can still get useful context (like the metadata of the data being sent / recieved).
How do I ensure the receipt of the data back to the Sender?
Include it (sending the receipt) as a step that is part of the transaction.
On a different angle - have a look on CodePlex for ESB type libraries, you might find something useful: http://www.codeplex.com/site/search?query=ESB&ac=8
For example ESBasic which seems to be a class library which you could reuse.

Related

Web service and transactional guarantees

How do you integrate applications via web services and deal with technical errors like connectivity errors for web service calls which change state?
E.g. when the network connection gets interrupted during a web service call, how does the client know whether the web services has processed its action or not?
Can this issue be solved at the business layer only (e.g. to query a previous call state) or are you aware of some nice frameworks/best practices which can help wrapping transactional guarantees around a web service?
Implementing it all by yourself with some kind of transactional context tracked in the business layer is always an option. You can use some compensation mechanisms to ensure transactions are rolled back if needed, but you'll need to:
have the information on transactions persisted somewhere
use transaction correlation IDs, so you can query when the response has
been lost (having correlation IDs is good idea anyway)
implement the operations needed to read/write/rollback, etc, so it might make your services a bit more complex
Another option I can think of is If you're using SOAP you can go for asynchronous communication and look for some stack implementing WS-Coordination, WS-AtomicTransaction and WS-BusinessActivity specifications, then decide for yourself if it is a good idea in your context or not. For example, I think Axis2 supports these, but of course eventually it depends on technologies and stack you use.
From the article above:
WS-AtomicTransaction defines a coordination type that is most useful
for handling system-generated exceptions, such as an incomplete write
operation or a process terminating abnormally.
Below are the types of 2-Phase Commit that it implements.
Hope this helps!

SOAP Pooling Advantages / Disadvantages

I am doing some research on SOAP, for a personal project, and I came across a website with a list of pros and cons for using SOAP, and I understood what most of them meant, except for this one under disadvantages:
SOAP is typically limited to pooling, and not event notifications, when leveraging HTTP for transport. What's more, only one client can use the services of one server in typical situations.
From my understanding of pooling, there should be no issue pooling a SOAP Object for re usability. Pooling is simply a way to use the same resources over and over again, like a connection to a database. Also not entirely certain on the context of Event Notifications.
So my two questions here are, what does the above block quoted text actually mean, and is this information correct?
Website: http://searchsoa.techtarget.com/definition/SOAP
SOAP is RPC, and in RPC some local client invokes a method on some remote target and receives a result. That's how it works, so SOAP works that way too. A client invokes a service asking for something and the service just responds.
If you want "events" in this type of communication the most simple approach is to invoke the service more often (i.e. polling). This has the advantage that nothing changes for the server or the client. It's the same RPC call but done more frequently.
These days everyone is connected to the web and everyone is subscribed to all sorts of services. They want to get notified as soon as something happens to the world around them. Pooling becomes inefficient in this sea of users and services because you are wasting resources. You might poll a service a hundred times just to get back one notification. For this reason technology is evolving so that resource use is minimized. And the direction this is moving to is push services.
Now almost everything happens in the browser. Every browser manufacturer rushes to implement the latest technology changes and HTML5 spec. This means actual pages that push notifications to users instead of faking it with Ajax, comet, etc.
SOAP has been around since 1998 and it's not moving as fast as the rest of the web, mainly because SOAP is mostly an enterprise player and because it's a protocol. Because it's a protocol you have to make new technology available to it without breaking that protocol. Things move slower so people have abandoned SOAP in favor of other ways of doing server-client communication.
SOAP is typically limited to pooling, and not event notifications...
That is correct. But be aware that "typically" does not mean "always".
You can have events, but it's harder. It involves using WS-* specifications like WS-Eventing and WS-Addressing. This is a change in the way SOAP clients operate because a client now becomes some sort of a service too because it needs to receive calls too, not just initiate them. If your technology stack implements these specifications then good for you, but if it doesn't, then you have to build it yourself and it's a real pain.
So for these reasons, if you don't have blocking performance or resource usage issues, you "typically" chose doing polling with SOAP and not event notifications.

Consultant designed a system that uses email as a webservice

I am looking for some solid arguements against a solution supplied where a public facing webserver hosts an aspx form and based on user input places the content of the form in XML in an email body and sends it to an email address only used for this solution. Then an internal system behind the company firewall reads the XML after retrieving the email from the email server and processes from there. I dont think this will be a robust solution and concerned about maintaining it so would just rather replace it now but there is pressure to keep solution.
Thanks
You mostly can't judge an architectural solution without knowing the specific constraints.
Under certain constraints, this may be very well be the best solution.
Let's take a look at the weak points first:
Messages may be lost due to the mail service not available.
Messages may be too big for the mail service. (In my corp we have a limit of 10Mb for instance.)
Messages may be corrupted on transfer. (Mail service may apply virus scanners and boast with this fact, add footers, rename attachments etc.)
Mail system may not cope with the additional burden, if the message traffic is too big.
Order of delivery is not guaranteed.
The solution is somewhat non-conventional.
Security and other non-func may be not fulfilled.
On the other hand:
This is probably a (very pragmatic) implementation of asynchronous messaging. Asynchronous messaging is often much more powerful and reliably than synchronous solutions.
This solution uses an already existing infrastructure.
Mail system does normally not lose messages "just so". So we basically have a reliable persistent message storage here.
Mail systems are often considered to be "mission critical" so they're often built highly reliable and redundant. So using the mail service may be actually more reliable than introducing a new software/hardware component.
And cheaper.
Can be tested with very pragmatic means.
E-mail has good library support.
You don't need an expensive professional for implementation.
So imagine the following constraints:
Build an asynchronous message processing.
Losing small percentage of messages is not a big deal.
Do it fast.
Do it cheap.
Quick and dirty is OK ("we'll throw it away in three month anyway").
Under these constraints this might be a very good and pragmatic solution.
To address the point by #techtrek:
"far more robust" - see above, mail system may actually be more reliable than an internal ESB infrastructure. At least this is my experience.
Agree, but not THAT risky. Attachments are normally NOT damaged in anyway. Otherwise management would scream every time their PowerPoint slides get corrupted.
Emails service down - well, ESB or any internal service may go down as well.
I don't quite understand why e-mail traceability is more complicated. I send an email, it either arrives or not. If not then this is a problem of the mail service. "Complicating" compared to what?
Well, of course mail service administration is separate, why is this a maintenance headache? We actually have all the platform services (databases, servers, ESB, etc.) administered and maintained by separate teams. This is a normal practice, I don't see why it should be a problem here. On the contrary, with the mail service you probably have a professional team specifically dedicated to the reliability of that transport channel.
Frankly, I saw quite a few ESB/MQ-solutions where I really thought that it would be MUCH cheaper, easier and in the fact more reliable if a few distinct apps would just send each other e-mails.
The issue with using email as the relay agent is that:
Creating a webservice that allows your company's internal system to directly intercept and parse XML system seems far more robust to me.
Encapsulating the XML mime type within a transmission protocol (Email) is risky in and of itself.
As a result of (2), there are two points of failure (corruption in the xml conversion process) and also the risk of the email service going down.
Beyond just points of failure, you're also complicating traceability by an order of magnitude.
Email administration is often separate to the administration of the web-service. Unless there are real, legitimate reaosns for this, it just sounds like more maintenance headache?
I agree with what has been said, especially on point 4. Application and eMail Maintenance might be disconnected entities.
Another aspect to consider is the possibility ti send mail from anywhere to the backend and flood it this way

What is the modern programming standard for synchronizing data between a web service and a client?

The question is a little general, so to help narrow the focus, I'll share my current setup that is motivating this question. I have a LAMP web service running a RESTful API. We have two client implementations: one browser-based javascript client (local storage store) and one iOS-based client (core data store). Obviously these two clients store data very differently, but the data itself needs to be kept in two-way sync with the remote server as often as possible.
Currently, our "sync" process is a little dumb (as in, non-smart). Conceptually, it looks like:
Client periodically asks the server for ALL of the most-recent data.
Server sends down the remote data, which overwrites the current set of local data in the client's store.
Any local creates/updates/deletes after this point are treated as gold, and immediately sent to the server.
The data itself is stored relationally, and updated occasionally by client users. The clients in my specific case don't care too much about the relationships themselves (which is why we can get away with local storage in the browser client for now).
Obviously this isn't true synchronization. I want to move to a system where, conceptually, a "diff" of the most recent changes are sent to the server periodically, and the server sends back a "diff" of the most recent changes it knows about. It seems very difficult to get to this point, but maybe I just don't understand the problem very well.
REST feels like a good start, but REST only talks about the way two data stores talk to each other, not how the data itself is synchronized between them. (This sync process is left up to the implementer of each store.) What is the best way to implement this process? Is there a modern set of programming design patterns that apply to inform a specific solution to this problem? I'm mostly interested in a general (technology agnostic) approach if possible... but specific frameworks would be useful to look at too, if they exist.
Multi-master replication is always (and will always be) difficult and bespoke, because how conflicts are handled will be specific to your application.
IMO A more robust approach is to use Master-slave replication, with your web service as the master and the clients as slaves. To keep the clients in sync, use an archived atom feed of the changes (see event sourcing) as per RFC5005. This is the closest you'll get to a modern standard for this type of replication and it's RESTful.
When the clients are online, they do not update their replica directly, instead they send commands to the server and have their replica updated via the atom feed.
When the clients are offline things get difficult. Your clients will need to have a model of how your web service behaves. It will need to have an offline copy of your replica, which should be copied on write from the online replica (the online replica is the one that is updated by the atom feed). When the client executes commands that modify the data, it should store the command (for later replay against the web service), the expected result (for verification during replay) and update the offline replica.
When the client goes back online, it should replay the commands, compare the result with the expected result and notify the client of any variances. How these variances are handled will vary based on your application. The offline replica can then be discarded.
CouchDB replication works over HTTP and does what you are looking to do. Once databases are synced on either end it will send diffs for adds/updates/deletes.
Couch can do this with other Couch machines or with a mobile framework like TouchDB.
https://github.com/couchbaselabs/TouchDB-iOS
I've done a fair amount of it, but you can always set up CouchDB on one machine, set up TouchDB on a mobile device and then watch the HTTP traffic go back and forth to get an idea of how they do it.
Or read this: http://guide.couchdb.org/draft/replication.html
Maybe something from the link above will help you get an idea of how to do your own diffs for your REST service. (Since they are both over HTTP thought it could be useful.)
You may want to look into the Dropbox Datastore API:
https://www.dropbox.com/developers/datastore
It sounds like it might be a very good fit for your purposes. They have iOS and javascript clients.
Lately, I've been interested in Meteor.
The platform sets up Mongo on the server and minimongo in the browser. The client subscribes to some data and when that data changes, the platform automatically sends down the new data to the client.
It's a clever solution to the syncing problem, and it solves several other problems as well. It will be interesting to see if more platforms do this in the future.

Best messaging medium for real-time SOA applications?

I'm working on a real time application implemented using in a SOA-style (read loosely coupled components connected via some messaging protocol - JMS, MQ or HTTP).
The architect who designed this system opted to use JMS to connect the components. This system is real time so there no need to queue up messages should one component fail (the transaction will simply time out). Further, there is no need for guaranteed delivery or rollback.
In this instance, is there any benefit to using JMS over something like an HTTP web service (speed, resource footprint, etc)?
One thing that I'm thinking is since the JMS approach requires us to set a thread pool size (the number of components listening to a JMS topic/queue), wouldn't a HTTP service be a better fit since this additional configuration is not needed (a new thread is created for each HTTP request making the application scalable to an "unlimited" number of requests until the server runs out of resources).
Am I missing something?
I don't disagree with the points made by S.Lott at all, but here are a couple of points to consider regarding HTTP web services:
Your clients only need to know how to communicate via HTTP - a protocol well supported by just about every modern langauge in one form or another. JMS, though popular, is more specialist than HTTP, and so restricts the languages your interconnected systems can use. Perhaps not an issue for your system at the moment, but will you need to plug in other systems later that might struggle to support JMS connectivity?
Standards like WSDL and SOAP which you could levarage for your services are well supported by many langauges and there are plenty of tools around that will generate code to implement both ends of the pipeline (client and server) for you from a WSDL file, reducing the amount of dev you'll have to do. These standards also make it relatively simple to define and publish the specification of the data you'll be passing between your systems, something you'll presumably have to do by hand using a queueing technology like JMS.
On the downside, as pointed out by S.Lott, JMS gives you functionality that you throw away using the (stateless) HTTP protocol: guaranteed ordering & reliability; monitoring; scalability; etc. Are you sure you don't need these, and won't need these going forward?
Great question, btw.
I think it's really dependent on the situation. Where I work, we support Remoting, JMS, MQ, HTTP, and sFTP. We are implementing a middleware appliance that speaks Remoting, JMS, MQ, and HTTP, and a software middleware component that speaks JMS, MQ, and HTTP.
As sgreeve alluded to above, standards help us become flexible, but proprietary formats allow more functionality.
In a nutshell, I'd say use HTTP for stateless calls (which could end up meeting almost all of your needs), and whatever proprietary formats you need for stateful calls. If you work in a big enterprise, a hardware appliance is usually a great fit as middleware: Lightning fast compression, encryption, transformation, and translation, with very low total cost of ownership.
I don't know enough about your requirements, but you may be overlooking Manageability, Flexibility and Performance.
JMS allows you to monitor and manage the queue. These are features HTTP lacks, and you'd have to build rather than buy from a vendor.
Also, There are queues and topics in JMS, allowing multiple subscribers to a single publisher. Not possible in HTTP.
While you may not need those things in release 1.0, you might want them in the future.
Also, JMS may be able to use other transport mechanisms like named sockets, which reduces the overheads if there isn't all that socket negotiation going on with (almost) every request.
If you go down the HTTP route and you want to support more than one machine or some kind of reliability - you are going to need a load balancer capable of discovering the available web servers and loading requests across them - then failing over to another web server if a particular box/process dies. Clients making HTTP requests are also going to have to deal with servers failing and retrying operations in some loop.
This is one of the main features of a message queue - reliable load balancing with failover and loose coupling among the producers and consumers without them having to include retry logic - so your client or server code doesn't have to worry about this kinda thing. This is totally separate to whether or not you want message persistence or want to use ACID transactions to produce/consume messages (which can be very handy BTW).
If you focus just on the server side using Java - whether Servlets or MessageListener/MDBs they are kinda similar either way really. The difference is the load balancer.
So maybe the question should really be - is a JMS broker easier to setup & work with than setting up your DNS/NAT/IP/HTTP load balancer infrastructure?
I suppose it depends on what you mean by real-time... Neither JMS nor HTTP in my opinion support "real-time" applications well, meaning they cannot offer predictable/deterministic performance nor properly prioritize flows in the presence of contention.
Part of it is that these technologies are built on top of TCP which serializes all traffic into a single FIFO meaning that different traffic flows cannot be easily prioritized. Moreover TCP timers are not easily controlled resulting unpredictable blocking and timeouts... For this reason many streaming applications use UDP instead of TCP as an underlying protocol.
Another problem with JMS is that typical implementations use a broker that centralizes message dispatch. This is not the best architecture to get deterministic performance.
If you are looking for a middleware that can offer you the kind of reliability guarantees and publish-subscribe semantics you get with JMS, but was developed to fit the real-time application domain I recommend you take a look at the OMG Data-Distribution Service (DDS). See dds.omg.org and this article I wrote arguing why DDS is the best middleware to implement a real-time SOA. http://soa.sys-con.com/node/467488