JAX-WS RI does not stream data from client - web-services

I have written a simple upload service with JAX-WS RI. Classes are generated with DataHandler type by XJC. At coding time #MTOM and #StreamingAttachment are added to the service implementation. Now when a client, a .NET client, sends the uploaded data with MTOM multipart/related and data is received but the data source in the data handler is com.sun.istack.ByteArrayDataSource and java.io.ByteArrayInputStream. This means that the stream from the client is fully consumed into memory. With larger files, the memory explodes.
After a few googling researches I haven't found anything but some SO questions regarding SOAP handlers which I do not have. No success yet.
I started a debugging session and figured out that the data internally is contained in a StreamingDataHandler which is the recommended way but the MtomCodec queries the length of the attachment and causes the underlying data object to fully consume the input stream into memory. End of story.
This seems pretty brain-dead to me because the entire optimization from MTOM is completely gone.
Does anyone know a solution to this? Otherwise the entire approach in JAX-WS RI seems to be useless, I have to resort to REST.
This might be a duplicate of MTOM not working when using SOAPHandler. The outcome is the same.
For what it's worth, I am on:
Tomcat 6.0.41
Java 7
JAXB 2.2.11
JAX-WS 2.2.10

After further debugging sessions and searches on the net, I have found several JIRA issues and blog posts that it is plain broken in JAX-WS RI. As soon as you have a handler, you're lost. This applies to outgoing but especially to incoming streams.
At the end, JAX-WS RI is unusable for file interaction. I have to evaluate Apache CXF at some point.

Related

SOAP Pooling Advantages / Disadvantages

I am doing some research on SOAP, for a personal project, and I came across a website with a list of pros and cons for using SOAP, and I understood what most of them meant, except for this one under disadvantages:
SOAP is typically limited to pooling, and not event notifications, when leveraging HTTP for transport. What's more, only one client can use the services of one server in typical situations.
From my understanding of pooling, there should be no issue pooling a SOAP Object for re usability. Pooling is simply a way to use the same resources over and over again, like a connection to a database. Also not entirely certain on the context of Event Notifications.
So my two questions here are, what does the above block quoted text actually mean, and is this information correct?
Website: http://searchsoa.techtarget.com/definition/SOAP
SOAP is RPC, and in RPC some local client invokes a method on some remote target and receives a result. That's how it works, so SOAP works that way too. A client invokes a service asking for something and the service just responds.
If you want "events" in this type of communication the most simple approach is to invoke the service more often (i.e. polling). This has the advantage that nothing changes for the server or the client. It's the same RPC call but done more frequently.
These days everyone is connected to the web and everyone is subscribed to all sorts of services. They want to get notified as soon as something happens to the world around them. Pooling becomes inefficient in this sea of users and services because you are wasting resources. You might poll a service a hundred times just to get back one notification. For this reason technology is evolving so that resource use is minimized. And the direction this is moving to is push services.
Now almost everything happens in the browser. Every browser manufacturer rushes to implement the latest technology changes and HTML5 spec. This means actual pages that push notifications to users instead of faking it with Ajax, comet, etc.
SOAP has been around since 1998 and it's not moving as fast as the rest of the web, mainly because SOAP is mostly an enterprise player and because it's a protocol. Because it's a protocol you have to make new technology available to it without breaking that protocol. Things move slower so people have abandoned SOAP in favor of other ways of doing server-client communication.
SOAP is typically limited to pooling, and not event notifications...
That is correct. But be aware that "typically" does not mean "always".
You can have events, but it's harder. It involves using WS-* specifications like WS-Eventing and WS-Addressing. This is a change in the way SOAP clients operate because a client now becomes some sort of a service too because it needs to receive calls too, not just initiate them. If your technology stack implements these specifications then good for you, but if it doesn't, then you have to build it yourself and it's a real pain.
So for these reasons, if you don't have blocking performance or resource usage issues, you "typically" chose doing polling with SOAP and not event notifications.

Creating middleware to make camera ONVIF compatible

My company is trying to figure out how to turn our current camera line into ONVIF compliant cameras.
What I've found is the specification documents and a bunch of WSDL files. But everything I've seen so far appears to set up "the client side" of things.
I'm trying to create a middleware service so that our existing cameras can become ONVIF supported.
Are the WSDL files used for both a client and a device?
How do companies program ONVIF compliant cameras? Our's are PTZs, would the PTZ WSDL be what I'm looking for?
How does one start the service device side. Although the specification covers everything it isn't written well for new developers of the standard.
Please help me figure out how I would turn my embedded linux camera in c++ into an ONVIF compliant camera. Do developers use the WSDLs to achieve this?
Thank you!
well one of the most common ways to implement ONVIF is via the gSoap library, it has a very vast guide regarding both client and server use cases. You should go through the server side documentation to get a glimpse of how it works. From a very generalized point of view - it has a wsdl2h tool that takes a set of WSDL files and generates stub code (mostly parsing and I/O code that takes care of creating structure representations of the request data) for you, then using another gSoap tool called soapcpp2 you can generate C/C++ client/server objects (I've worked only with client side, so I guess the guide mentioned above is the best way to understand how to build a server using the generated objects). Then you can host a service and interact with the requests from the camera through this C/C++ object abstraction, which should be quite easy. All the request xmls are deserialized to object instances, and you can just look at the needed fields, create an instance of the needed response object and send it back. At least I've been using gSoap so far for client requests to ONVIF cameras and I'm quite satisfied. Here is a small tutorial from the maintainers of gSoap on how to deploy a simple service.
That being said, I've seen cameras that don't use gSoap or any other high level framework and just parse the request content with any common xml parser and have response string templates that are formatted with the needed values and sent back - if your camera is not very complex this might work, but it depends on your needs. Feel free to ask any follow up questions, at least for me ONVIF was quite a spiders web when I started.

Is using Mirth Connect or any other interface engine overkill in this situation?

I've been assigned a small project and directed to use Mirth Connect as part of the solution. We currently do not use Mirth but because we have an upcoming project that will require an interface engine, I was asked to use it for this project so I can gain experience with it. However, I think it's a poor suggestion for this project; I also know my boss would not want me to implement something that adds unnecessary complexity just for the sake of learning.
With that said, I want to make sure I have valid reasons for suggesting that Mirth Connect should not be used for this project. Neither of us know much about it, but I think he's been convinced it is the end all solution for all things interface/webservice related. I appreciate any input I can get from those of you who have more experience with the product than I have.
This is a very simple project in that we have a client needing to make a handful of requests into our system from there's in order to retrieve and update data. For example, they will make a request to get patient demographics, to add an admission for a patient, a request to get a list of possible care settings from our application, etc. For this project we will not use HL7 but a set of predefined XML messages.
Both the client's application and our application reside on the client's network.
They do not want to build any services of their own, so the services we build need to handle all of the work. The results returned in response to their calls to the services will be returned as XML.
There are no plans to integrate any other applications with theirs or ours in the foreseeable future.
It seems to me the best option would be for us to build a standalone web service that would take their request and send back an XML response. I just don't see any reason to include Mirth Connect in the picture (other than for learning but that can be gained in other ways).
What are your thoughts? Is it true that the interface engine is not a good choice if the client wants to receive data from our system without having a receiving mechanism on their end? In other words, they want to make a web service call such as GetCareSettings and to get a response back with an XML representation of all the possible care settings in our system. It seems to me they would need a web service on their end for Mirth to use as a destination to send the results. All Mirth is going to send back is an ACK message, correct? (Unless of course it wrote the data to another webservice on the client end, which they have said they do not want to do.)
Thanks for taking the time to read this. I hope my lack of knowledge and understanding of Mirth Connect and the use of interface engines hasn't made this question difficult to answer.
From what I understand, Your client appears to be either a Lab or a third party service vendor, who will take inputs from your application like patient demographic charts, appointments, provider details etc. Basically he wants to query your application.
A) HL7: It has the capacity to handle query request and response with demographics. I am assuming that you have done you might be knowing about QRY messages.
B) XML/webservices/SOAP:still provides a viable solution, a little more concrete and can be expanded to Handle custom request like GetCallSettings, or may be any other. The vendor is not just interested in fetching patient related data but also other inputs for which HL7 might not be enough.
If we talk about approach, then its a professional advice to use an interface engine. It is not limited to just using mirth connect, you can also use Iguana if you want. A good reason which comes instantly to my mind is that an engine gives you an advantage while troubleshooting, support and maintenance activity.
Your Webservice responses can be handled easily by HTTP sender connector type and through RESTful webservices.
The engine is also capable of handling large volumes of request and responses at the same time, which in case is not required right now, but I think will be the condition later on. Your source in the channel shall change to an Webservice Listener.
Another good approach is to do away with XML and use JSON for handling request and responses, a much more light weighted than XML, to save your overhead with the network. We are doing some similar work, but we are sending request to a webservice through JSON.
Overall, Mirth is there to make your life more easier.
Good Luck!

What is the modern programming standard for synchronizing data between a web service and a client?

The question is a little general, so to help narrow the focus, I'll share my current setup that is motivating this question. I have a LAMP web service running a RESTful API. We have two client implementations: one browser-based javascript client (local storage store) and one iOS-based client (core data store). Obviously these two clients store data very differently, but the data itself needs to be kept in two-way sync with the remote server as often as possible.
Currently, our "sync" process is a little dumb (as in, non-smart). Conceptually, it looks like:
Client periodically asks the server for ALL of the most-recent data.
Server sends down the remote data, which overwrites the current set of local data in the client's store.
Any local creates/updates/deletes after this point are treated as gold, and immediately sent to the server.
The data itself is stored relationally, and updated occasionally by client users. The clients in my specific case don't care too much about the relationships themselves (which is why we can get away with local storage in the browser client for now).
Obviously this isn't true synchronization. I want to move to a system where, conceptually, a "diff" of the most recent changes are sent to the server periodically, and the server sends back a "diff" of the most recent changes it knows about. It seems very difficult to get to this point, but maybe I just don't understand the problem very well.
REST feels like a good start, but REST only talks about the way two data stores talk to each other, not how the data itself is synchronized between them. (This sync process is left up to the implementer of each store.) What is the best way to implement this process? Is there a modern set of programming design patterns that apply to inform a specific solution to this problem? I'm mostly interested in a general (technology agnostic) approach if possible... but specific frameworks would be useful to look at too, if they exist.
Multi-master replication is always (and will always be) difficult and bespoke, because how conflicts are handled will be specific to your application.
IMO A more robust approach is to use Master-slave replication, with your web service as the master and the clients as slaves. To keep the clients in sync, use an archived atom feed of the changes (see event sourcing) as per RFC5005. This is the closest you'll get to a modern standard for this type of replication and it's RESTful.
When the clients are online, they do not update their replica directly, instead they send commands to the server and have their replica updated via the atom feed.
When the clients are offline things get difficult. Your clients will need to have a model of how your web service behaves. It will need to have an offline copy of your replica, which should be copied on write from the online replica (the online replica is the one that is updated by the atom feed). When the client executes commands that modify the data, it should store the command (for later replay against the web service), the expected result (for verification during replay) and update the offline replica.
When the client goes back online, it should replay the commands, compare the result with the expected result and notify the client of any variances. How these variances are handled will vary based on your application. The offline replica can then be discarded.
CouchDB replication works over HTTP and does what you are looking to do. Once databases are synced on either end it will send diffs for adds/updates/deletes.
Couch can do this with other Couch machines or with a mobile framework like TouchDB.
https://github.com/couchbaselabs/TouchDB-iOS
I've done a fair amount of it, but you can always set up CouchDB on one machine, set up TouchDB on a mobile device and then watch the HTTP traffic go back and forth to get an idea of how they do it.
Or read this: http://guide.couchdb.org/draft/replication.html
Maybe something from the link above will help you get an idea of how to do your own diffs for your REST service. (Since they are both over HTTP thought it could be useful.)
You may want to look into the Dropbox Datastore API:
https://www.dropbox.com/developers/datastore
It sounds like it might be a very good fit for your purposes. They have iOS and javascript clients.
Lately, I've been interested in Meteor.
The platform sets up Mongo on the server and minimongo in the browser. The client subscribes to some data and when that data changes, the platform automatically sends down the new data to the client.
It's a clever solution to the syncing problem, and it solves several other problems as well. It will be interesting to see if more platforms do this in the future.

Message queuing solutions?

(Edited to try to explain better)
We have an agent, written in C++ for Win32. It needs to periodically post information to a server. It must support disconnected operation. That is: the client doesn't always have a connection to the server.
Note: This is for communication between an agent running on desktop PCs, to communicate with a server running somewhere in the enterprise.
This means that the messages to be sent to the server must be queued (so that they can be sent once the connection is available).
We currently use an in-house system that queues messages as individual files on disk, and uses HTTP POST to send them to the server when it's available.
It's starting to show its age, and I'd like to investigate alternatives before I consider updating it.
It must be available by default on Windows XP SP2, Windows Vista and Windows 7, or must be simple to include in our installer.
This product will be installed (by administrators) on a couple of hundred thousand PCs. They'll probably use something like Microsoft SMS or ConfigMgr. In this scenario, "frivolous" prerequisites are frowned upon. This means that, unless the client-side code (or a redistributable) can be included in our installer, the administrator won't be happy. This makes MSMQ a particularly hard sell, because it's not installed by default with XP.
It must be relatively simple to use from C++ on Win32.
Our client is an unmanaged C++ Win32 application. No .NET or Java on the client.
The transport should be HTTP or HTTPS. That is: it must go through firewalls easily; no RPC or DCOM.
It should be relatively reliable, with retries, etc. Protection against replays is a must-have.
It must be scalable -- there's a lot of traffic. Per-message impact on the server should be minimal.
The server end is C#, currently using ASP.NET to implement a simple HTTP POST mechanism.
(The slightly odd one). It must support client-side in-memory queues, so that we can avoid spinning up the hard disk. It must allow flushing to disk periodically.
It must be suitable for use in a proprietary product (i.e. no GPL, etc.).
How is your current solution showing its age?
I would push the logic on to the back end, and make the clients extremely simple.
Messages are simply stored in the file system. Have the client write to c:/queue/{uuid}.tmp. When the file is written, rename it to c:/queue/{uuid}.msg. This makes writing messages to the queue on the client "atomic".
A C++ thread wakes up, scans c:\queue for "*.msg" files, and if it finds one it then checks for the server, and HTTP POSTs the message to it. When it receives the 200 status back from the server (i.e. it has got the message), then it can delete the file. It only scans for *.msg files. The *.tmp files are still being written too, and you'd have a race condition trying to send a msg file that was still being written. That's what the rename from .tmp is for. I'd also suggest scanning by creation date so early messages go first.
Your server receives the message, and here it can to any necessary dupe checking. Push this burden on the server to centralize it. You could simply record every uuid for every message to do duplication elimination. If that list gets too long (I don't know your traffic volume), perhaps you can cull it of items greater than 30 days (I also don't know how long your clients can remain off line).
This system is simple, but pretty robust. If the file sending thread gets an error, it will simply try to send the file next time. The only time you should be getting a duplicate message is in the window between when the client gets the 200 ack from the server and when it deletes the file. If the client shuts down or crashes at that point, you will have a file that has been sent but not removed from the queue.
If your clients are stable, this is a pretty low risk. With the dupe checking based on the message ID, you can mitigate that at the cost of some bookkeeping, but maintaining a list of uuids isn't spectacularly daunting, but again it does depend on your message volume and other performance requirements.
The fact that you are allowed to work "offline" suggests you have some "slack" in your absolute messaging performance.
To be honest, the requirements listed don't make a lot of sense and show you have a long way to go in your MQ learning. Given that, if you don't want to use MSMQ (probably the easiest overall on Windows -- but with [IMO severe] limitations), then you should look into:
qpid - Decent use of AMQP standard
zeromq - (the best, IMO, technically but also requires the most familiarity with MQ technologies)
I'd recommend rabbitmq too, but that's an Erlang server and last I looked it didn't have usuable C or C++ libraries. Still, if you are shopping MQ, take a look at it...
[EDIT]
I've gone back and reread your reqs as well as some of your comments and think, for you, that perhaps client MQ -> server is not your best option. I would maybe consider letting your client -> server operations be HTTP POST or SOAP and allow the HTTP endpoint in turn queue messages on your MQ backend. IOW, abstract away the MQ client into an architecture you have more control over. Then your C++ client would simply be HTTP (easy), and your HTTP service (likely C# / .Net from reading your comments) can interact with any MQ backend of your choice. If all your HTTP endpoint does is spawn MQ messages, it'll be pretty darned lightweight and can scale through all the traditional load balancing techniques.
Last time I wanted to do any messaging I used C# and MSMQ. There are MSMQ libraries available that make using MSMQ very easy. It's free to install on both your servers and never lost a message to this day. It handles reboots etc all by itself. It's a thing of beauty and 100,000's of message are processed daily.
I'm not sure why you ruled out MSMQ and I didn't get point 2.
Quite often for queues we just dump record data into a database table and another process lifts rows out of the table periodically.
How about using Asynchronous Agents library from .NET Framework 4.0. It is still beta though.
http://msdn.microsoft.com/en-us/library/dd492627(VS.100).aspx