(Java) How can I pass Schema-validated XML documents as parameters between distributed components (e.g. web services or sockets)? - web-services

Here is a description of the scenario and I would appreciate also any comments on the approach used
The core of my application is a set of web services backed by a P2P database. One service accepts a simple XML-based record (I have designed a generic schema for it). The service processes this data (mainly creating keys based on certain criteria) and pass the original data along with the created keys to a listening SocketServer in one of the listening P2P nodes. This key,data pair is routed to the proper node, which stores the data (associated with the key as an ID) in an XML database.
A second service accepts a query document that is structured based on the same schema, but with optional values that would be used for searching and matching from the previously stored ones. So the second service would pass this query (with the proper keys) to the P2P part, get back the results and pass them back to the service client.
E.g. if the original record submitted to the first service was < attr1 >value1 < /attr1 > < attr2 > value2 < / attr2 > (attribute list along with some other metadata mandated by the schema) then the second service should retrieve that record if the query received was < attr2 >value2 < / attr2 >
(I could later think about using more complex XPath or XQuery queries as the underlying XML database allows instead of exact matches for values here but that is not important at this stage. there is also a third service I am working on but it depends on getting the first two in proper shape first)
So my questions are:
1) What data type should I use as the parameters of the web services? How to utilize my schema for this usage? I was considering various XML binding frameworks (especially JAXB and SDO) for this but didn't know how to proceed.
2) How can I enhance the two services (call them store and search) to use dynamically created templates based on the original generic schema? The service would still accept documents of the main schema type but has the inner attribute list based on a template say template1 only requires whose values are ints while template2 require (float) and (string). The current JSP-based prototype manually creates this template but as an XML document that is assembled by hand (<>tags dispersed in text) and there is no type checking what so ever so I thought I could do better!
3) Is it possible to generate a quick web app prototype for simple access to this system (again by using the schema (&templates) to edit the appropriate XML message structures? What I am looking for is for the (human) user to choose a template and then just "fill in the blanks" and submit, no need for any fancy look and feel.
4) Can I or how can I also use this XML message type for communicating across sockets?
5) Does it matter if I deploy the services as stateless EJBs or not? Do I need them to be EJBs or servlets would be more than enough?
I currently have a rudimentary implementation (from previous developers) that were meant for a subset of my current requirements (I am improving on the services and adding new derived ones) but there was no schema nor validation and the data is passed all along as basic strings, thus providing weak typing and difficult to update manual parsing. The reason I want to update this to a stronger bound typing is to introduce changes in the data schema that would be passed along the whole system easily. Basically I want the system to be as less coupled to the data format/schema used as possible; the current prototype is too coupled to the data that I am finding it extremely difficult to change the data without breaking the system.
My initial investigation led me to consider JAXB but it supports only static typing (cannot create a schema/types dynamically at runtime that I want to persist for later usage). So I came across SDO which has both dynamic and static typing. The problem is just that there is not enough community and/or examples of using this approach so it seems risky (the examples of Apache Tuscany and Eclipselink implementations are very scarce and I could not find complete examples that are not 5+ years old (like this http://www.ibm.com/developerworks/java/library/j-sdo/) and also tackles the XML use case of SDO (most seem to focus on the relational usage of SDO).
This is my first time asking for programming help (here and elsewhere) so please bear with me. I searched a lot on the net but I could not find anything useful but pieces here and there that did not add up.
Any comment or hint is really appreciated.
trfndr
EDIT
I forgot one thing: how would the search service get back the results? Since it is opening a client socket connection, there is no way to get back any results synchronously. The current implementation tackles this by having the service client opening a listening socket on a random port and putting this contact info in the query document. After the search web service sends the query to the p2p part it finishes. The p2p sends the results as a WS call to another service which sends them back to the service client socket. I don't like this approach much, is there any more elegant solution?

I lead the EclipseLink JAXB & SDO implementations and represent Oracle on those specifications so hopefully I can help you out. This question is very similar to talk I'm giving at JavaOne in September.
1) What data type should I use as the
parameters of the web services? How to
utilize my schema for this usage? I
was considering various XML binding
frameworks (especially JAXB and SDO)
for this but didn't know how to
proceed.
This depend's on what web service framework you are using. JAXB is much easier to use with JAX-WS, and while JAXB is still easier to use with JAX-RS SDO, is a possible alternative.
2) How can I enhance the two services
(call them store and search) to use
dynamically created templates based on
the original generic schema? The
service would still accept documents
of the main schema type but has the
inner attribute list based on a
template say template1 only requires
whose values are ints while template2
require (float) and (string). The
current JSP-based prototype manually
creates this template but as an XML
document that is assembled by hand
(<>tags dispersed in text) and there
is no type checking what so ever so I
thought I could do better!
I'm not 100% what you mean here, but the following may be helpful:
Using #XmlAnyElement to Build a Generic Message
3) Is it possible to generate a quick
web app prototype for simple access to
this system (again by using the schema
(&templates) to edit the appropriate
XML message structures? What I am
looking for is for the (human) user to
choose a template and then just "fill
in the blanks" and submit, no need for
any fancy look and feel.
JAX-RS is a nice framework for creating quick prototypes. Below is an example I created:
Part 1 - The Database
Part 2 - Mapping the Database to Objects
Part 3 - Mapping the Objects to XML
Part 4 - The RESTful Service
Part 5 - The Client
4) Can I or how can I also use this
XML message type for communicating
across sockets?
I prefer frameworks like JAX-RS that communicate over the HTTP protocol.
5) Does it matter if I deploy the
services as stateless EJBs or not? Do
I need them to be EJBs or servlets
would be more than enough?
My preference is to use an EJB session bean for the service. If you are interacting with a database then you can leverage the Java Transaction API (JTA) to manage your database transactions.
Part 4 - The RESTful Service
SDO
EclipseLink is the SDO 2.1.1 (JSR-235) reference implementation. We have some examples posted below. If you are looking how to do something specific, I will try to post a relevant example.
http://wiki.eclipse.org/EclipseLink/Examples/SDO
JAXB
JAXB is static. It is also more popular than SDO. Recognizing this in EclipseLink we have implemented a dynamic JAXB feature. It gives you the dynamic aspect of SDO with a JAXB slant.
http://wiki.eclipse.org/EclipseLink/Examples/MOXy/Dynamic
EDIT #1
Since you are dealing with JAX-WS and your model is almost entirely dynamic, I think you should skip the JAXB binding altogether. In the following link see the section "Switching Off Data Binding"
http://java.sun.com/developer/technicalArticles/xml/jaxrpcpatterns3/
This will give us the body of the message as a javax.xml.transform.Source object. We will need to process the XML based on the dynamic templates. SDO would be a good choice here. You can constantly add new types to the HelperContext using XML schemas.
helperContext.getXSDHelper().define(schema1, null);
helperContext.getXSDHelper().define(schema2, null);
You wil be able to unmarshal the Source from the web service as follows:
XMLDocument doc = helperContext.getXMLHelper().load(source, null, null);
DataObject rootDataObject = doc.getRootObject();
String someValue = rootDataObject.getString("attr3/childAttr/anotherChildAttr");
You will also be able to use the XMLHelper to marshal your objects to XML when calling another service.

Related

How to add an element to WS-operation in consumer end for 1 provider when there's other providers?

I have authored a WSDL and the consumer/client that implements operation AddCar that has data for model and colour. Now one WS producer/server wants to also have data for length. I assume that other producers have difficulties to adapt to this change due to implementation outsourcing. My options include:
Make new operation AddCarWithLength
Make 2 versions of WSDL and consumer code with same operation
Just update the WSDL with optional length and include it operation data only for producer that wants it.
Just update the WSDL with 0-N name-vaue pair elements and include it operation data only for producer that wants it.
Demand customers that they get the company that implemented the WS producer to update it.
Options:
is out of the question
I have generated C# classes in consumer/client so there would be two code sets. i still would have to know (maybe with config parameter or smthn) which version producer/server uses
Means that I only have to know which producer/server i talk with.
Same as 3 but would allow future extensibility
Can be problematic
Question:
What is the correct / best way to do this when demanding all producers to be updated can be unrealistic?
WSDL are known for the precious definitions. At first services should always designed with a clear picture of usage and future in mind. Anyhow, now my understanding is adding a attribute (data element - length) to your existing WCF service. My suggestion would be,
Analyse and add a custom class of yours and name it as a data contract and add to your WCF operation and expose it as a new interface / operation contract.
Eg.
with in Class car , have a data member as Properties.
With in properties define all your analysis result elements like, length, width, color, weight etc.
Also add a Dictionary<string,string> CustomAttributes; so in future you can use it.
Similar to above, but if you don't have much time.
with out any analysis just add a Dictionary<string,string> Parameters; and expose a new contract and utilize that.

RESTful search. Return actual resources or URIs?

Pretty new to all this REST stuff.
I'm designing my API, and am not sure what I'm supposed to return from a search query. I was assuming I would just return all objects that match the query in their entirety, but after reading up a bit about HATEOAS I am thinking I should be returning a list of URI's instead?
I can see that this could help with caching of items, but I'm worried that there will be a lot of overhead generated by the subsequent multiple HTTP requests required to get the actual object info.
Am I misunderstanding? Is it acceptable to return object instances instead or URIs?
I would return a list of resources with links to more details on those resources.
From RESTFull Web Services Cookbook 2010 - Subbu Allamaraju
Design the response of a query as a representation of a collection
resource. Set the appropriate expiration caching headers. If the query
does not match any resources, return an empty collection.
IMHO it is important to always remember that "pure REST" and "real world REST" are two quite different beasts.
How are you returning the list of URIs from your query in the first place? If you return e.g. application/json, this certainly does not tell the client how it is supposed to interpret the content; therefore, the interaction is already being driven by out-of-band information (the client magically already knows where to look for the data it needs) in conflict with HATEOAS.
So, to answer your question: I find it quite acceptable to return object instances instead of URIs -- but be careful because in the general case this means you are generating all this data without knowing if the client is even going to use it. That's why you will see a hybrid approach quite often: the object instances are not full objects (i.e. a portion of the information the server has is not returned), but they do contain a unique identifier that allows the client to fetch the full representation of selected objects if it chooses to do so.

Can a .NET oData DataService force filtering child records?

This should be a simple scenario - I have a data model with a parent/child relationship. For example's sake, let's say it's Orders and OrderDetails - 1 Order -> many OrderDetails.
I'd like to expose the model via oData using a standard DataService, but with a few limitations.
First, I should only see my Orders. That's simple enough using EntitySetRights.ReadSingle and a QueryInterceptor to make sure the order is in fact mine.
So far, so good! But how can the associated OrderDetail records be exposed in the oData feed in a way where I can read OrderDetails for a specific (read single) Order without giving access to the entire OrderDetails table?
In other words, I want to allow reading my details
myUrl.com/OrderService.svc/Orders(5)/OrderDetails <-- Good! My order is #5
but not everyone's details
myUrl.com/OrderService.svc/OrderDetails <-- Danger, Scarry, Keep Out!
Thanks for the help!
This is so called "containment" - your sample exactly described here: http://data.uservoice.com/forums/72027-wcf-data-services-feature-suggestions/suggestions/1012615-support-containment-hierarchical-models-in-odata?ref=title
WCF Data Services doesn't support this out of the box yet.
It is theoretically possible to implement such restriction with a custom LINQ provider. In your LINQ implementation you could detect the expansion (not that hard) and in that case allow it. But you could prevent queries to the entity set itself (also rather easy to recognize). For more details of how the LINQ expressions look plese refere to this series: http://blogs.msdn.com/b/vitek/archive/2010/02/25/data-services-expressions-part-1-intro.aspx
It depends on what provider you wanted to use originally. If you had a custom provider already this is not that hard. If you had a reflection based provider, it is possible to layer this on top. If you had EF, this might be rather tricky (not sure if it's even possible).

How to solve two REST problems: the interface document; loss of privacy in descriptive URLs

Coming from a lot of frustrating times with WSDL/Soap, I very much like the REST paradigm, but am trying to solve two basic problems in our application, before moving over to REST. The first problem relates to the lack of an interface document. I think I finally see how to handle this situation: One can query his way down from a top-level "/resources" resource using various requests of GET, HEAD, and OPTIONS to find the one needed resource in the correct hypermedia format. Is this the idea? If so, the client need only be provided with a top-level resource URI: http://www.mywebservicesite.com/mywebservice/resources. He will then have to do some searching and possible keep track of what he is discovering, so that he can use the URIs again efficiently in future to do GETs, POSTs, PUTs, and DELETEs. Are there any thoughts on what should happen here?
The other problem is that we cannot use descriptive URLs like /resources/../customer/Madonna/phonenumber. We do have an implementation of opaque URLs we use in the context of a session, and I'm wondering how opaque URLs might be applied to REST. The general problem is how to keep domain-specific details out of URLs, and still benefit from what REST has to offer.
The other problem is that we cannot use descriptive URLs like /resources/../customer/Madonna/phonenumber.
I think you've misunderstood the point of opaque URIs. The notion of opaque URIs is with respect to clients: A client shall not decipher a URI to guess anything of semantic meaning from it. So a service may well have URIs like /resources/.../customer/Madonna/phonenumber, and that's quite a good idea. The URIs should be treated as opaque by clients: not infer from the URI that it represents Madonna's phone number, and that Madonna is a customer of some sort. That knowledge can only be obtained by looking inside the URI itself, or perhaps by remembering where the URI was discovered.
Edit:
A consequence of this is that navigation should happen by links, not by deconstructing the URI. So if you see /resouces/customer/Madonna/phonenumber (and it actually represents Customer Madonna's phone number) you should have links in that resource to point to the Madonna resource: e.g.
{
"phone_number" : "01-234-56",
"customer_URI": "/resources/customer/Madonna"
}
That's the only way to navigate from a phone number resource to a customer resource. An important aspect is that the server implementation might or might not have domain specific information in the URI, The Madonna record might just as well live somewhere else: /resources/customers/byid/81496237. This is why clients should treat URIs as opaque.
Edit 2:
Another question you have (in the comments) is then how a client, with the required no knowledge of the server's URIs is supposed to be able to find anything. Clients have the following possibilities to find resources:
Provide a search interface. This could be done by providing an OpenSearch description document, which tells clients how to search for items. An OpenSearch template can include several variables, and several endpoints, depending on what you're looking for. So if you have a "customer ID" that's unique, you could have the following template: /customers/byid/{proprietary:customerid}", the customerid element needs to be documented somewhere, inside the proprietary namespace. A client can then know how to use such a template.
Provide a custom form. This implies making a custom media type in which you explicitly define how (based on an instance of the document) a URI to a customer can be forged. <customers template="/customers/byid/{id}"/>. The documentation (for the media type) would have to state that the template attribute must be interpreted as a relative URI after the string substitution "{id}" to an actual customer ID.
Provide links to all resources. Some resources aren't innumerable, so you can simply make a link to each and every one of them, optionally including identifying information along with the links. This could also be done in a custom media type: <customer id="12345" href="/customer/byid/12345"/>.
It should be noted that #1 and #2 are two ways of saying the same thing: Clients are allowed to create URIs if they
haven't got the URI structure a priori
a media type exists for which the documentation states that URIs should be created
This is much the same way as a web browser has no idea of any URI structure on the web, except for the rules laid out in the definition of HTML forms, to add a ? and then all the query parameters separated by &.
In theory, if you have a customer with id 12345, then you could actually dispense with the href, since you could plug the customer id 12345 into #1 or #2. It's more common to actually provide real links between resources, rather than always relying on lookup or search techniques.
I haven't really used web RPC systems (WSDL/Soap), but i think the 'interface document' is there mostly to allow client libraries to create the service API, right? if so, REST shouldn't need it, because the verbs are already defined and don't really need to be documented again.
AFAIUI, the REST way is to document the structure of each resource (usually encoded in XML or JSON). In that document, you'll also have to document the relationship between those resources. In my case, a resource is often a container of other resources (sometimes more than one type), therefore the structure doc specifies what field holds a list of URLs pointing to the contained resources. Ideally, only one unique resource will need a single, fixed (documented) URL. everithing else follows from there.
The URL 'style' is meaningless to the client, since it shouldn't 'construct' an URL. Every URL it needs should be already constructed on a resource field. That let's you change the URL structure without changing the client (that has saved tons of time to me). Your URLs can be as opaque or as descriptive as you like. (personally, i don't like text keys or slugs; my keys are all BIGINTs or UUIDs)
I am currently building a REST "agent" that addresses the first part of your question. The agent offers a temporary bookmarking service. The client code that is interacting with the agent can request that an URL be bookmarked using some identifier. If the client code needs to retrieve that representation again, it simply asks the agent for the url that corresponds to the saved bookmark and then navigates to that bookmark. Currently those bookmarks are not persisted so they only last for the lifetime of the client application, but I have found it a useful mechanism for accessing commonly used resources. E.g. The root representation provides a login link. I bookmark that link and if the client ever receives a 401 then I can redirect to the "login" bookmark.
To address an issue you mentioned in a comment, the agent also has the ability to store retrieved representations in a dictionary. If it becomes necessary to aggregate and manipulate multiple representations at the same time then I can simply request that the agent store the current representation in a dictionary associated to a key and then continue navigating to the next resource. Once the client has accumulated all the necessary representation it can do what it needs to do.

Save data through a web service using NHibernate?

We currently have an application that retrieves data from the server through a web service and populates a DataSet. Then the users of the API manipulate it through the objects which in turn change the dataset. The changes are then serialized, compressed and sent back to the server to get updated.
However, I have begin using NHibernate within projects and I really like the disconnected nature of the POCO objects. The problem we have now is that our objects are so tied to the internal DataSet that they cannot be used in many situations and we end up making duplicate POCO objects to pass back and forth.
Batch.GetBatch() -> calls to web server and populates an internal dataset
Batch.SaveBatch() -> send changes to web server from dataset
Is there a way to achieve a similar model that we are using which all database access occurs through a web service but use NHibernate?
Edit 1
I have a partial solution that is working and persisting through a web service but it has two problems.
I have to serialize and send my whole collection and not just changed items
If I try to repopulate the collection upon return my objects then any references I had are lost.
Here is my example solution.
Client Side
public IList<Job> GetAll()
{
return coreWebService
.GetJobs()
.BinaryDeserialize<IList<Job>>();
}
public IList<Job> Save(IList<Job> Jobs)
{
return coreWebService
.Save(Jobs.BinarySerialize())
.BinaryDeserialize<IList<Job>>();
}
Server Side
[WebMethod]
public byte[] GetJobs()
{
using (ISession session = NHibernateHelper.OpenSession())
{
return (from j in session.Linq<Job>()
select j).ToList().BinarySerialize();
}
}
[WebMethod]
public byte[] Save(byte[] JobBytes)
{
var Jobs = JobBytes.BinaryDeserialize<IList<Job>>();
using (ISession session = NHibernateHelper.OpenSession())
using (ITransaction transaction = session.BeginTransaction())
{
foreach (var job in Jobs)
{
session.SaveOrUpdate(job);
}
transaction.Commit();
}
return Jobs.BinarySerialize();
}
As you can see I am sending the whole collection to the server each time and then returning the whole collection. But I'm getting a replaced collection instead of a merged/updated collection. Not to mention the fact that it seems highly inefficient to send all the data back and forth when only part of it could be changed.
Edit 2
I have seen several references on the web for almost a transparent persistent mechanism. I'm not exactly sure if these will work and most of them look highly experimental.
ADO.NET Data Services w/NHibernate (Ayende)
ADO.NET Data Services w/NHibernate (Wildermuth)
Custom Lazy-loadable Business Collections with NHibernate
NHibernate and WCF is Not a Perfect Match
Spring.NET, NHibernate, WCF Services and Lazy Initialization
How to use NHibernate Lazy Initializing Proxies with Web Services or WCF
I'm having a hard time finding a replacement for the DataSet model we are using today. The reason I want to get away from that model is because it takes a lot of work to tie every property of every class to a row/cell of a dataset. Then it also tightly couples all of my classes together.
I've only taken a cursory look at your question, so forgive me if my response is shortsighted but here goes:
I don't think you can logically get away from doing a mapping from domain object to DTO.
By using the domain objects over the wire you are tightly coupling your client and service, part of the reason to have a service in the first place is to promote loose coupling. So that's an immediate issue.
On top of that you're going to end up with a brittle domain logic interface where you can't make changes on the service side without breaking your client.
I suspect your best bet would be to implement a loosely coupled service which implements a REST / or some other loosely coupled interface. You could use a product such as automapper to make the conversions simpler and easier and also flatten data structures as necessary.
At this point I don't know of any way to really cut down the verbosity involved in doing the interface layers but having worked on large projects that didn't make the effort I can honestly tell you the savings wasn't worth it.
I think your issue revolves around this issue:
http://thatextramile.be/blog/2010/05/why-you-shouldnt-expose-your-entities-through-your-services/
Are you or are you not going to send ORM-Entities over the wire?
Since you have a Services-Oriented architecture.. I (like the author) do not recommend this practice.
I use NHibernate. I call those ORM-Entities. They are THE POCO model. But they have "virtual" properties that allow for lazy-loading.
However, I also have some DTO-Objects. These are also POCO's. These do not have lazy'loading friendly properties.
So I do alot of "converting". I hydrate ORM-Entities (with NHibernate)...and then I end up converting them to Domain-DTO-Objects. Yes, it stinks in the beginning.
The server sends out the Domain-DTO-Objects's. There is NO lazy loading. I have to populate them with the "Goldie Locks" "just right" model. Aka, if I need Parent(s) with one level of children, I have to know that up front and send the Domain-DTO-Objects over that way, with just the right amount of hydration.
WHen I send back Domain-DTO-Objects's (from client to the server), I have to reverse the process. I convert the Domain-DTO-Objects into ORM-Entities. And allow NHibernate to work with the ORM-Entities.
Because the architecture is "disconnected", I do alot of (NHiberntae) ".Merge()" calls.
// ormItem is any NHibernate poco
using (ISession session = ISessionCreator.OpenSession())
{
using (ITransaction transaction = session.BeginTransaction())
{
session.BeginTransaction();
ParkingAreaNHEntity mergedItem = session.Merge(ormItem);
transaction.Commit();
}
}
.Merge is a wonderful thing. Entity Framework does not have it. Boo.
Is this alot of setup? Yes.
Do I think it is perfect? No.
However. Because I send very basic DTO's(Poco's) that are not "flavored" to the ORM, I have the ability to switch ORM's without killing my contracts to the outside world.
My datalayer can be ADO.NET, EF, NHibernate, or anything. I have to write the "Converters" if I switch, and the ORM code, but everything else is isolated.
Many people argue with me. They said I'm doing too much, and the ORM-Entities are fine.
Again, I like to "now allow any lazy loading" appearances. And I prefer to have my data-layer isolated. My clients should not know or care about my data-layer/orm of choice.
There are just enough subtle differences (or some not so subtle ones) between EF and NHibernate to screwball the game plan.
Do my Domain-DTO-Objects's look 95% like my ORM-Entities? Yep. But its the 5% that can screwball you.
Moving from DataSets, especially if they are populated from stored-procedures with alot of biz-logic in the TSQL, isn't trivial. But now that I do object model, and I NEVER write a stored procedure that isn't simple CRUD functions, I'd never go back.
And I hate maintenance projects with voodoo TSQL in the stored procedures. It ain't 1999 anymore. Well, most places.
Good luck.
PS Without .Merge(in EF), here is what you have to do in a disconnected world: (boo microsoft)
http://www.entityframeworktutorial.net/EntityFramework4.3/update-many-to-many-entity-using-dbcontext.aspx