How do you document a tree-to-tree transformation in a human-readable format? - web-services

I need to document an application that serves as a facade for a set of webservices. The application accepts SOAP requests and transforms these requests into a format understandable by the underlying web service. There are several such services, each with its own interface. Some accept SOAP, some HTTP POST, some... other formats not mentioned in polite society.
I need to document how we map the fields from our SOAP calls to the fields for these other formats. Before everyone cries "XSLT" I must mention that the notation must be human-friendly. Ideally it would be something Excel-able.
Has anyone encountered this sort of problems before? How did you solve it? Is there a human-friendly notation for tree-to-tree transformations that can fit on a spreadsheet?

I've had to do just this. The way I did it was to just start writing, following the hierarchical structure.
I eventually would find that I was repeating myself. An example was that certain elements had a common set of attributes. I would pull the documentation of that common set up before the sections on the specific elements. Same thing with documentation of handling of specific simpleTypes.
Eventually, there was even some high level discussion on the overall flow and "philosophy" of the transformation. But I let it all happen bit by bit, fixing it as I became bored with repetition.
That said, I'm a developer, not a tech writer.

I haven't really found anything so far, but I've found pointers to many libraries that help transform objects of one type to another in Java. For reference, I'm listing the most promising ones here, all doing some kind of JavaBean to JavaBean conversion:
Transmorph
EZMorph
Dozer

Related

How to offer lists of valid values for parameters in a RESTful API?

This is more a conceptual than technical question, I guess. Suppose I have a REST API for dealing with a huge rent-a-car fleet.
The API is modelled around the business entities/resources in the very standard and coherent (even if controversial) way like that:
/cars/1234 - detailed data about a certain car
/clients/5678 - detailed data about a certain client
/cars - a list of cars and their URIs
/clients - a list of clients
However, the fleet is huge and a list of all cars is not that useful. I would rather have it filtered, like:
GET /cars?type=minivan
For properly using the "type" parameter, I should have a list of valid values such as "minivan", "convertible", "station-wagon", "hatchback", "sedan", etc. Ok, there are not that many kinds of cars out there, but let´s suppose this list is something too big for an enum in the API´s Swagger definition.
So... What would be the most consistent and natural way for a REST API to offer a list of valid values for a query parameter like that?
As a subordinated resource like /cars/types ? This would break the /cars/{id} URL pattern, isn´t it?
As a separate resource such as /tables/cars/types? This would break the consistency around the main resources of the business model itself, right?
As part of the body of a response for OPTIONS /cars ? It looks like the "RESTfullest" way to me, but some of my coworkers disagree, and OPTIONS seems to be rarely used for things like that.
Maybe as part of a response to GET /cars?&metadata=values or something alike? The "values" here would seem more semantically related to the returned data than to the query parameters, isn´t it?
Anything else?
I have googled and searched in SO for some recommendations about this particular subject, but I could not find anything to help me with arguments for such a decision...
Thank you!
Fabricio Rocha
Brasilia, Brasil
"a good REST API is like an ugly website" -- Rickard Öberg
So how would you do it in a website? Well, you'd probably have a link that goes to a form, and the form would have a list control / radio buttons with semantic cues for each option, with the expectation that the user would select a value from the options available, and the user agent would encode that value into the URL of the GET request when the form was submitted.
So in REST, you do the same thing. In the initial response, you would include a link to your "form" resource; and when a user agent gets the form resource you return a hypermedia representation of your form, with the available options encoded within it, and when the form is submitted your resources pick the client choice(s) out of the query part of the identifier.
But you probably aren't doing REST: it's a colossal PITA, and the benefits of the REST architecture constraints probably don't pay off in your context. So you are likely just looking for a reasonable spelling of an identifier for a resource that returns a message with a list of options.
As a subordinated resource like /cars/types ? This would break the /cars/{id} URL pattern, isn´t it?
Assuming your routing implementation can handle the ambiguity, that's a fine choice. You might consider whether there's just one list, or different lists for different contexts, and how to handle that.
As a separate resource such as /tables/cars/types? This would break the consistency around the main resources of the business model itself, right?
Remember OO programming and encapsulation? Decoupling the API from the underlying data model is a good thing.
That said, I'm personally not fond of the "tables" as an element in your hierarchy. If you wanted to head that direction, I'd suggest /dimensions -- it's a spelling you might use if you were designing a data warehouse
As part of the body of a response for OPTIONS /cars ? It looks like the "RESTfullest" way to me, but some of my coworkers disagree, and OPTIONS seems to be rarely used for things like that.
Yikes! RFC 7231 suggests that a very confusing idea.
The OPTIONS method requests information about the communication options available for the target resource, at either the origin server or an intervening intermediary.
(emphasis added). When writing APIs for the web, you should always be keeping in mind that a client request may go through intermediaries that you do not control; your ability to provide a good experience in those circumstances depends on not confusing the intermediaries by deviating from the uniform interface.
Maybe as part of a response to GET /cars?&metadata=values or something alike?
For the most part, machines are pretty comfortable with any spelling. URI design guidelines normally focus on the human audience. I think that particular spelling will confuse your human consumers, especially if /cars?... would otherwise identify the resource that is a search result.
Anything else? I still feel that what people expect to find under /cars is... a bunch of cars (their representations, I mean), not a list of values among them...
So let's change up your question a little bit
What would be the most consistent and natural way for a REST API to document a list of valid values for a query parameter like that?
If there's one thing the web is really good for, it is documenting things. Pick almost any well documented web API, and pay careful attention to where you are reading about the endpoints -- that will give you some good ideas.
For instance, you could look at the StackExchange API where
https://api.stackexchange.com/docs/questions
tells you everything you need to know about the family of resources resources at
https://api.stackexchange.com/2.2/questions
Types, unsurprisingly, are documented like:
https://api.stackexchange.com/docs/types/flag-option
If you wanted to be sexy about it, you could use Accept-Type to negotiate a redirect to either human readable documentation or machine readable documentation.
I'm in a similar situation but I have a large number of fields each of which has a large number of possible values, and in some cases the values come from a hierarchy so my field is an array of strings. (Borrowing from your example: you might want to record the plant that manufactured the car but rather than a 1-dimensional list these are organised by: continent, country, and state).
I think I'll implement a /taxonomies resource to provide all the data to users. I see WordPress uses a similar scheme (http://v2.wp-api.org/reference/taxonomies/) although I haven't studied it closely yet.

Web Service (JAX-WS,SOAP) WSDL Structure Design

This is a bit of an open ended question and quite a bit of text but hang on with me.
First a little bit of background. For a last year I have been writing web services. I started with zero knowledge in this field. My job was to prepare WSDL (I could structure it however I wanted) and to implement it. I had a little influence in the way XSD Schemas were prepared but not much. I took part in 2 projects. First one had just about 5 web services, while second has more than 20 and still growing. Both undergo constant XSD Schema modifications.
Now to the point.
I decided to use two different approaches in each in terms of how I structured WSDL design. In first project I had separated the WSs between WSDLs, one WS per WSDL. This worked quite well, especially since the XSD Schemas used were different (except one basic schema that rarely changed) for each operation. You did however have 5 different WSDL, which seemed wasteful to me.
Second project had far more WSs and much larger number and more complex schemas. Here I decided to group the services into only a few WSDLs according to their use. All seemed great until I realized (big lack of knowledge on my part) that WSs, when called, are recognized by their body main element (actually I still have no idea why that is so). A lot of our WSs have the same element as input and since they are in the same WSDL, they have the same path. This caused a warning from the application server (Weblogic 10.3) but it worked. I guess it used SOAPAction header, since we're using SOAP 1.1. A problem arouse when we deployed the services on the enterprise bus (OSB). It would not work unless SOAPAction was chosen as the selection algorithm or, like our oracle advisor suggested, an additional unique element was created for each WS.
And here is the problem. How to structure the WSDL. Options I considered are:
Choose SOAPAction header as selection algorithm. Personally I liked that solution but since SOAP 1.2 does not have it anymore, the powers that be (my boss), tells me we can't use it. I know SOAP 1.2 will have an optional "action" header but since its optional, I again can't use it (bosses reasoning: might not be in the next version at all, clients might not want to use optional feature, since its extra cost, I don't know, its a business thing).
Use unique elements for each WS. This just looks ugly and unnecessary, but its the best we have so far.
Go back to the previous set up and do one WS per WSDL. This is definitely least favored method. It does not feel right and from what I have read it is just bad design. But it works and gives quite a bit of flexibility.
Now, if anyone has any other solutions I would love to hear them, because this has me stumped for last month or so. I just can't find a solution that just feels right.
Thanks in advance.

Restful API - handling large amounts of data

I have written my own Restful API and am wondering about the best way to deal with large amounts of records returned from the API.
For example, if I use GET method to myapi.co.uk/messages/ this will bring back the XML for all message records, which in some cases could be 1000's. This makes using the API very sluggish.
Can anyone suggest the best way of dealing with this? Is it standard to return results in batches and to specify batch size in the request?
You can change your API to include additional parameters to limit the scope of data returned by your application.
For instance, you could add limit and offset parameters to fetch just a little part. This is how pagination can be done in accordance with REST. A request like this would result in fetching 10 resources from the messages collection, from 21st to 30th. This way you can ask for a specific portion of a huge data set:
myapi.co.uk/messages?limit=10&offset=20
Another way to decrease the payload would be to only ask for certain parts of your resources' representation. Here's how facebook does it:
/joe.smith/friends?fields=id,name,picture
Remember that while using either of these methods, you have to provide a way for the client to discover each of the resources. You can't assume they'll just look at the parameters and start changing them in search of data. That would be a violation of the REST paradigm. Provide them with the necessary hyperlinks to avoid it.
I strongly recommend viewing this presentation on RESTful API design by apigee (the screencast is called "Teach a Dog to REST"). Good practices and neat ideas to approach everyday problems are discussed there.
EDIT: The video has been updated a number of times since I posted this answer, you can check out the 3rd edition from January 2013
There are different ways in general by which one can improve the API performance including for large API sizes. Each of these topics can be explored in depth.
Reduce Size Pagination
Organizing Using Hypermedia
Exactly What a User Need With Schema Filtering
Defining Specific Responses Using The Prefer Header
Using Caching To Make Response
More Efficient More Efficiency Through Compression
Breaking Things Down With Chunked Responses
Switch To Providing More Streaming Responses
Moving Forward With HTTP/2
Source: https://apievangelist.com/2018/04/20/delivering-large-api-responses-as-efficiently-as-possible/
if you are using .net core
you have to try this magic package
Microsoft.AspNetCore.ResponseCompression
then use this line in configureservices in startup file
services.AddResponseCompression();
then in configure function
app.UseResponseCompression();

Can you help clarify some points regarding RESTful services and Code Generation?

I've been struggling with understanding a few points I keep reading regarding RESTful services. I'm hoping someone can help clarify.
1a) There seems to be a general aversion to generated code when talking about RESTful services.
1b) The argument that if you use a WADL to generate a client for a RESTful service, when the service changes - so does your client code.
Why I don't get it: Whether you are referencing a WADL and using generated code or you have manually extracted data from a RESTful response and mapped them to your UI (or whatever you're doing with them) if something changes in the underlying service it seems just as likely that the code will break in both cases. For instance, if the data returned changes from FirstName and LastName to FullName, in both instances you will have to update your code to grab the new field and perhaps handle it differently.
2) The argument that RESTful services don't need a WADL because the return types should be well-known MIME types and you should already know how to handle them.
Why I don't get it: Is the expectation that for every "type" of data a service returns there will be a unique MIME type in existence? If this is the case, does that mean the consumer of the RESTful services is expected to read the RFC to determine the structure of the returned data, how to use each field, etc.?
I've done a lot of reading to try to figure this out for myself so I hope someone can provide concrete examples and real-world scenarios.
REST can be very subtle. I've also done lots of reading on it and every once in a while I went back and read Chapter 5 of Fielding's dissertation, each time finding more insight. It was as clear as mud the first time (all though some things made sense) but only got better once I tried to apply the principles and used the building blocks.
So, based on my current understanding let's give it a go:
Why do RESTafarians not like code generation?
The short answer: If you make use of hypermedia (+links) There is no need.
Context: Explicitly defining a contract (WADL) between client and server does not reduce coupling enough: If you change the server the client breaks and you need to regenerate the code. (IMHO even automating it is just a patch to the underlying coupling issue).
REST helps you to decouple on different levels. Hypermedia discoverability is one of the goods ones to start with. See also the related concept HATEOAS
We let the client “discover” what can be done from the resource we are operating on instead of defining a contract before. We load the resource, check for “named links” and then follow those links or fill in forms (or links to forms) to update the resource. The server acts as a guide to the client via the options it proposes based on state. (Think business process / workflow / behavior). If we use a contract we need to know this "out of band" information and update the contract on change.
If we use hypermedia with links there is no need to have “separate contract”. Everything is included within the hypermedia – why design a separate document? Even URI templates are out of band information but if kept simple can work like Amazon S3.
Yes, we still need a common ground to stand on when transferring representations (hypermedia), so we define your own media types or use widely accepted ones such as Atom or Micro-formats. Thus, with the constraints of basic building blocks (link + forms + data - hypermedia) we reduce coupling by keeping out of band information to a minimum.
As first it seems that going for hypermedia does not change the impact of change :) : But, there are subtle differences. For one, if I have a WADL I need to update another doc and deploy/distribute. Using pure hypermedia there is no impact since it's embedded. (Imagine changes rippling through a complex interweave of systems). As per your example having FirstName + LastName and adding FullName does not really impact the clients, but removing First+Last and replacing with FullName does even in hypermedia.
As a side note: The REST uniform interface (verb constraints - GET, PUT, POST, DELETE + other verbs) decouples implementation from services.
Maybe I'm totally wrong but another possibility might be a “psychological kick back” to code generation: WADL makes one think of the WSDL(contract) part in “traditional web services (WSDL+SOAP)” / RPC which goes against REST. In REST state is transferred via hypermedia and not RPC which are method calls to update state on the server.
Disclaimer: I've not completed the referenced article in detail but I does give some great points.
I have worked on API projects for quite a while.
To answer your first question.
Yes, If the services return values change (Ex: First name and Last name becomes Full Name) your code might break. You will no longer get the first name and last name.
You have to understand that WADL is a Agreement. If it has to change, then the client needs to be notified. To avoid breaking the client code, we release a new version of the API.
The version 1.0 will have First Name and last name without breaking your code. We will release 1.1 version which will have the change to Full name.
So the answer in short, WADL is there to stay. As long as you use that version of the API. Your code will not break. If you want to get full name, then you have to move to the new versions. With lot of code generation plugins in the technology market, generating the code should not be a issue.
To answer your next question of why not WADL and how you get to know the mime types.
WADL is for code generation and serves as a contract. With that you can use JAXB or any mapping framework to convert the JSON string to generated bean objects.
If not WADL, you don't need to inspect every element to determine the type. You can easily do this.
var obj =
jQuery.parseJSON('{"name":"John"}');
alert( obj.name === "John" );
Let me know, If you have any questions.

What does using RESTful URLs buy me?

I've been reading up on REST, and I'm trying to figure out what the advantages to using it are. Specifically, what is the advantage to REST-style URLs that make them worth implementing over a more typical GET request with a query string?
Why is this URL:
http://www.parts-depot.com/parts/getPart?id=00345
Considered inferior to this?
http://www.parts-depot.com/parts/00345
In the above examples (taken from here) the second URL is indeed more elegant looking and concise. But it comes at a cost... the first URL is pretty easy to implement in any web language, out of the box. The second requires additional code and/or server configuration to parse out values, as well as additional documentation and time spent explaining the system to junior programmers and justifying it to peers.
So, my question is, aside from the pleasure of having URLs that look cool, what advantages do RESTful URLs gain for me that would make using them worth the cost of implementation?
The hope is that if you make your URL refer to a noun then there is a better chance that you will implement the HTTP verbs correctly. Beyond that there is absolutely no advantage of one URL versus another.
The reality is that the contents of an URL are completely irrelevant to a RESTful system. It is simply an identifier.
It's not what it looks like, it is what you do with it that is important.
One way of looking at REST:
http://tomayko.com/writings/rest-to-my-wife (which has now been taken down, sadly, but can still be see on web.archive.org)
So anyway, HTTP—this protocol Fielding
and his friends created—is all about
applying verbs to nouns. For instance,
when you go to a web page, the browser
does an HTTP GET on the URL you type
in and back comes a web page.
...
Instead, the large majority are busy
writing layers of complex
specifications for doing this stuff in
a different way that isn’t nearly as
useful or eloquent. Nouns aren’t
universal and verbs aren’t
polymorphic. We’re throwing out
decades of real field usage and proven
technique and starting over with
something that looks a lot like other
systems that have failed in the past.
We’re using HTTP but only because it
helps us talk to our network and
security people less. We’re trading
simplicity for flashy tools and
wizards.
One thing that jumps out at me (nice question by the way) is what they describe. The first describes an operation (getPart), the second describes a resource (part 00345).
Also, maybe you couldn't use other HTTP verbs with the first - you'd need a new method for putPart, for example. The second can be reused with different verbs (like PUT, DELETE, POST) to 'manipulate' the resource? I suppose you're also kinda saying GET twice - once with the verb, again in the method, so the second is more consistent with the intent of the HTTP protocol?
One that I always like as a savvy web-user, but certainly shouldn't be used as a guiding principle for when to use such a URL scheme is that those types of URLs are "hackable". In particular for things like blogs where I can just edit a date or a page number within a URL instead of having to find where the "next page" button is.
The biggest advantage of REST IMO is that it allows a clean way to use the HTTP Verbs (which are the most important on REST services). Actually, using REST means you are using the HTTP protocol and its verbs.
Using your urls, and imagining you want to post a "part", instead of getting it
First case should be like this:
You are using a GET where you should have used a post
http://www.parts-depot.com/parts/postPart?param1=lalala&param2=lelele&param3=lilili
While on a REST context, it should be
http://www.parts-depot.com/parts
and on the body, (for example) a xml like this
<part>
<param1>lalala<param1>
<param2>lelele<param1>
<param3>lilili<param1>
</part>
URI semantics are defined by RFC 2396. The extracts particularly pertinent to this question are 3.3. "Path Component":
The path component contains data, specific to the authority (or the
scheme if there is no authority component), identifying the resource
within the scope of that scheme and authority.
And 3.4 "Query Component":
The query component is a string of information to be interpreted by
the resource.
Note that the query component is not part of the resource identifier, it is merely information to be interpreted by the resource.
As such, the resource being identified by your first example is actually just /parts/getPart. If your intention is that the URL should identify one specific part resource then the first example does not do that, whereas the second one (/parts/00345) does.
So the 'advantage' of the second style of URL is that it is semantically correct, whereas the first one is not.
"The second requires additional code
and/or server configuration to parse
out values,"
Really? You choose a poor framework, then. My experience is that the RESTful version is exactly the same amount of code. Maybe I just lucked into a cool framework.
"as well as additional documentation
and time spent explaining the system
to junior programmers"
Only once. After they get it, you shouldn't have to explain it again.
"and justifying it to peers."
Only once. After they get it, you shouldn't have to explain it again.
Don't use query/search parts in URLs which aren't queries or searches, if you do that - according to the URL spec - you are likely implying something about that resource that you don't really want to.
Use query parts for resources that are a subset of some bigger resource - pagination is a good example of where this could be applied.