Does Apache Thrift validate parameters? - web-services

I'm considering using Apache Thrift for a PHP server that implements web services.
If my web service were to be defined to accept two parameters - a user name as a string, and a password as an integer - would thrift validate that parameters of those types were supplied by the client, or do I still need to perform validation on the server?
I'm not asking here for the purposes of sanitising input, but rather for returning meaningful error responses to clients, and whether if a service is invoked with incorrect parameters requests will even be made to the server.

To my understanding, the check - if any - is performed at "library level", where the local types are translated into Thrift types, where the code generation happens. In other words, it will depend on the bindings for the specific language you are using (PHP, Erlang, whatever), to raise meaningful - or not - errors if types are not respected. But I have to look into it a bit more.

It basically depends on what kind of validation we are talking.
If the parameters need to be set, you could make them "required". In that case, the fields are checked in most language implementations (not in all) during deserialization and an error is thrown whenever a reuirwed Parameter is missing. However, required comes at the cost of potential problems with versioning: you can't omit a required field once the API has been published, or it will break compatibility otherwise.
The presence of normal ("optional") values can typically be checked by means of the _isset flags, which exist for exactly this purpose. This is something that your own code is responsible for - Thrift does provide the messaging mechanisms, but of course not the interpretation of your data.
If validation in your case means, that the data Format (string and number) should be checked: Because the code for both Client and Server is generated from the Thrift IDL file, the implementation will use the correct field serialization and deserialization method according to the IDL file:
Service sample
{
bool Login( 1: string Name, 2: i32 pwd)
}
If you Need to Change for whatever reason the Password from i32 into string, technically this results in two changes:
remove the Password field with the ID 2, except if the field is "required"
add a new field with the new ID 3
so it looks this way:
Service sample
{
bool Login( 1: string Name, /*2: i32 _deprecated_pwd*/, 3: string pwd)
}
As field type and field purpose are bound to the numeric field ID, it is highly recommended to use another field ID in such a case, which is so far unused within this scope. It is also recommended to leabve the old fields in the IDL to indicate outdated IDs.
A good reference about this "soft versioning" stuff and the pros and cons of "required" fields can be found in Diwaker Gupta's "Missing Guide".

Related

Blank value in web service for Int64 type

I consume a web service that has a numeric element. The Delphi wsdl importer sets it up as Int64.
The web service allows this element to be blank. However, because it is defined as Int64, when I consume the web service in Delphi without setting a value for it, it defaults to 0 because it's an Int64. But I need it to be blank and the web service will not accept a value of 0 (0 is defined as invalid and returns an error by the web service).
How can I pass a blank value if the type is Int64?
Empty age (example)
<E06_14></E06_14>
could have a special meaning, for example be "unknown" age.
In this case, the real question is how to make the field nillable on the Delphi side.
From this post of J.M. Babet:
Support for 'nil' has been an ongoing issue. Several built-in types of
Delphi are not nullable. So we opted to use a class for these cases
(not elegant but it works). So with the latest update for Delphi 2007
I have added several TXSxxxx types to help with this. Basically:
TXSBoolean, TXSInteger, TXSLong, etc. TXSString was already there but
it was not registered. Now it is. When importing a WSDL you must
enable the Use 'TXSString for simple nillable types' option to make
the importer switch to TXSxxxx types. On the command line it is the
"-0z+" option.
The DocWiki for the Import WSDL Wizard also shows two options related to nillable elements:
Process nillable and optional elements - Check this option to make the WSDL importer generate relevant information about optional
and nillable properties. This information is used by the SOAP runtime
to allow certain properties be nil.
Use TXSString for simple nillable types - The WSDL standard allows simple types to be nil, in Delphi or NULL, in C++, while Delphi
and C++ do not allow that. Check this option to make the WSDL importer
overcome this limitation by using instances of wrapper classes.

Should the paramaters provided in a web service call be included in the response

Iv not got much experience with creating web services, however, I do spend a lot of time interfacing with them.
I wondered if there was a best practice that stated weather or not parameter that are provided in the request should be included in the response.
E.g.
Request:
a.com/getStuff?key=123
(JSON) Response:
{"key":"123",
"value":"abc"}
or..
(JSON) Response:
{"value":"abc"}
I much prefer the more verbose first option because it dos not enforce coupling between the request and the response. i.e. the response dosn't care what the request was, so you do not need to pass state around.
Is there a best practice?
If you are referencing a record in a database, or some other entity that is uniquely identified by an integer, GUID, or specifically-formatted string value, you should ALWAYS return that unique ID with the response, particularly if you are planning to allow the user to update that entity or reference it in a subsequent operation for creating related data or searching for related data.
If you are returning a derived value that may be a composite of many records' values, or of environmentally specific data (such as "How much free disk space is on my server?"), then the supplied parameters wouldn't mean anything in the response, and therefore shouldn't be returned.
Your point on coupling request-response is right on the money. If you are doing multiple simultaneous asynchronous calls, then the key value is very useful when handling the responses.
Referring to your example: I think id should always be part of the resource representation in your case JSON). The representation should be as self-explaining and self-referrable as possible. On top of an id-attribute/field I like also to use a link field:
{
"id":123,
"link":{
"href":"http://api.com/item/123",
"rel":"self"
},
otherData...
}
If your example GET /getStuff?key=123 is more a search (the parameter looks a bit like that) then it good to present the user a "summary" of your search:
{
"items":[{
item1...
},
{
item2...
}
],
"submitted-params":{
"key":"123",
"other-param":"paramValue"
}
}

(Java) How can I pass Schema-validated XML documents as parameters between distributed components (e.g. web services or sockets)?

Here is a description of the scenario and I would appreciate also any comments on the approach used
The core of my application is a set of web services backed by a P2P database. One service accepts a simple XML-based record (I have designed a generic schema for it). The service processes this data (mainly creating keys based on certain criteria) and pass the original data along with the created keys to a listening SocketServer in one of the listening P2P nodes. This key,data pair is routed to the proper node, which stores the data (associated with the key as an ID) in an XML database.
A second service accepts a query document that is structured based on the same schema, but with optional values that would be used for searching and matching from the previously stored ones. So the second service would pass this query (with the proper keys) to the P2P part, get back the results and pass them back to the service client.
E.g. if the original record submitted to the first service was < attr1 >value1 < /attr1 > < attr2 > value2 < / attr2 > (attribute list along with some other metadata mandated by the schema) then the second service should retrieve that record if the query received was < attr2 >value2 < / attr2 >
(I could later think about using more complex XPath or XQuery queries as the underlying XML database allows instead of exact matches for values here but that is not important at this stage. there is also a third service I am working on but it depends on getting the first two in proper shape first)
So my questions are:
1) What data type should I use as the parameters of the web services? How to utilize my schema for this usage? I was considering various XML binding frameworks (especially JAXB and SDO) for this but didn't know how to proceed.
2) How can I enhance the two services (call them store and search) to use dynamically created templates based on the original generic schema? The service would still accept documents of the main schema type but has the inner attribute list based on a template say template1 only requires whose values are ints while template2 require (float) and (string). The current JSP-based prototype manually creates this template but as an XML document that is assembled by hand (<>tags dispersed in text) and there is no type checking what so ever so I thought I could do better!
3) Is it possible to generate a quick web app prototype for simple access to this system (again by using the schema (&templates) to edit the appropriate XML message structures? What I am looking for is for the (human) user to choose a template and then just "fill in the blanks" and submit, no need for any fancy look and feel.
4) Can I or how can I also use this XML message type for communicating across sockets?
5) Does it matter if I deploy the services as stateless EJBs or not? Do I need them to be EJBs or servlets would be more than enough?
I currently have a rudimentary implementation (from previous developers) that were meant for a subset of my current requirements (I am improving on the services and adding new derived ones) but there was no schema nor validation and the data is passed all along as basic strings, thus providing weak typing and difficult to update manual parsing. The reason I want to update this to a stronger bound typing is to introduce changes in the data schema that would be passed along the whole system easily. Basically I want the system to be as less coupled to the data format/schema used as possible; the current prototype is too coupled to the data that I am finding it extremely difficult to change the data without breaking the system.
My initial investigation led me to consider JAXB but it supports only static typing (cannot create a schema/types dynamically at runtime that I want to persist for later usage). So I came across SDO which has both dynamic and static typing. The problem is just that there is not enough community and/or examples of using this approach so it seems risky (the examples of Apache Tuscany and Eclipselink implementations are very scarce and I could not find complete examples that are not 5+ years old (like this http://www.ibm.com/developerworks/java/library/j-sdo/) and also tackles the XML use case of SDO (most seem to focus on the relational usage of SDO).
This is my first time asking for programming help (here and elsewhere) so please bear with me. I searched a lot on the net but I could not find anything useful but pieces here and there that did not add up.
Any comment or hint is really appreciated.
trfndr
EDIT
I forgot one thing: how would the search service get back the results? Since it is opening a client socket connection, there is no way to get back any results synchronously. The current implementation tackles this by having the service client opening a listening socket on a random port and putting this contact info in the query document. After the search web service sends the query to the p2p part it finishes. The p2p sends the results as a WS call to another service which sends them back to the service client socket. I don't like this approach much, is there any more elegant solution?
I lead the EclipseLink JAXB & SDO implementations and represent Oracle on those specifications so hopefully I can help you out. This question is very similar to talk I'm giving at JavaOne in September.
1) What data type should I use as the
parameters of the web services? How to
utilize my schema for this usage? I
was considering various XML binding
frameworks (especially JAXB and SDO)
for this but didn't know how to
proceed.
This depend's on what web service framework you are using. JAXB is much easier to use with JAX-WS, and while JAXB is still easier to use with JAX-RS SDO, is a possible alternative.
2) How can I enhance the two services
(call them store and search) to use
dynamically created templates based on
the original generic schema? The
service would still accept documents
of the main schema type but has the
inner attribute list based on a
template say template1 only requires
whose values are ints while template2
require (float) and (string). The
current JSP-based prototype manually
creates this template but as an XML
document that is assembled by hand
(<>tags dispersed in text) and there
is no type checking what so ever so I
thought I could do better!
I'm not 100% what you mean here, but the following may be helpful:
Using #XmlAnyElement to Build a Generic Message
3) Is it possible to generate a quick
web app prototype for simple access to
this system (again by using the schema
(&templates) to edit the appropriate
XML message structures? What I am
looking for is for the (human) user to
choose a template and then just "fill
in the blanks" and submit, no need for
any fancy look and feel.
JAX-RS is a nice framework for creating quick prototypes. Below is an example I created:
Part 1 - The Database
Part 2 - Mapping the Database to Objects
Part 3 - Mapping the Objects to XML
Part 4 - The RESTful Service
Part 5 - The Client
4) Can I or how can I also use this
XML message type for communicating
across sockets?
I prefer frameworks like JAX-RS that communicate over the HTTP protocol.
5) Does it matter if I deploy the
services as stateless EJBs or not? Do
I need them to be EJBs or servlets
would be more than enough?
My preference is to use an EJB session bean for the service. If you are interacting with a database then you can leverage the Java Transaction API (JTA) to manage your database transactions.
Part 4 - The RESTful Service
SDO
EclipseLink is the SDO 2.1.1 (JSR-235) reference implementation. We have some examples posted below. If you are looking how to do something specific, I will try to post a relevant example.
http://wiki.eclipse.org/EclipseLink/Examples/SDO
JAXB
JAXB is static. It is also more popular than SDO. Recognizing this in EclipseLink we have implemented a dynamic JAXB feature. It gives you the dynamic aspect of SDO with a JAXB slant.
http://wiki.eclipse.org/EclipseLink/Examples/MOXy/Dynamic
EDIT #1
Since you are dealing with JAX-WS and your model is almost entirely dynamic, I think you should skip the JAXB binding altogether. In the following link see the section "Switching Off Data Binding"
http://java.sun.com/developer/technicalArticles/xml/jaxrpcpatterns3/
This will give us the body of the message as a javax.xml.transform.Source object. We will need to process the XML based on the dynamic templates. SDO would be a good choice here. You can constantly add new types to the HelperContext using XML schemas.
helperContext.getXSDHelper().define(schema1, null);
helperContext.getXSDHelper().define(schema2, null);
You wil be able to unmarshal the Source from the web service as follows:
XMLDocument doc = helperContext.getXMLHelper().load(source, null, null);
DataObject rootDataObject = doc.getRootObject();
String someValue = rootDataObject.getString("attr3/childAttr/anotherChildAttr");
You will also be able to use the XMLHelper to marshal your objects to XML when calling another service.

How to solve two REST problems: the interface document; loss of privacy in descriptive URLs

Coming from a lot of frustrating times with WSDL/Soap, I very much like the REST paradigm, but am trying to solve two basic problems in our application, before moving over to REST. The first problem relates to the lack of an interface document. I think I finally see how to handle this situation: One can query his way down from a top-level "/resources" resource using various requests of GET, HEAD, and OPTIONS to find the one needed resource in the correct hypermedia format. Is this the idea? If so, the client need only be provided with a top-level resource URI: http://www.mywebservicesite.com/mywebservice/resources. He will then have to do some searching and possible keep track of what he is discovering, so that he can use the URIs again efficiently in future to do GETs, POSTs, PUTs, and DELETEs. Are there any thoughts on what should happen here?
The other problem is that we cannot use descriptive URLs like /resources/../customer/Madonna/phonenumber. We do have an implementation of opaque URLs we use in the context of a session, and I'm wondering how opaque URLs might be applied to REST. The general problem is how to keep domain-specific details out of URLs, and still benefit from what REST has to offer.
The other problem is that we cannot use descriptive URLs like /resources/../customer/Madonna/phonenumber.
I think you've misunderstood the point of opaque URIs. The notion of opaque URIs is with respect to clients: A client shall not decipher a URI to guess anything of semantic meaning from it. So a service may well have URIs like /resources/.../customer/Madonna/phonenumber, and that's quite a good idea. The URIs should be treated as opaque by clients: not infer from the URI that it represents Madonna's phone number, and that Madonna is a customer of some sort. That knowledge can only be obtained by looking inside the URI itself, or perhaps by remembering where the URI was discovered.
Edit:
A consequence of this is that navigation should happen by links, not by deconstructing the URI. So if you see /resouces/customer/Madonna/phonenumber (and it actually represents Customer Madonna's phone number) you should have links in that resource to point to the Madonna resource: e.g.
{
"phone_number" : "01-234-56",
"customer_URI": "/resources/customer/Madonna"
}
That's the only way to navigate from a phone number resource to a customer resource. An important aspect is that the server implementation might or might not have domain specific information in the URI, The Madonna record might just as well live somewhere else: /resources/customers/byid/81496237. This is why clients should treat URIs as opaque.
Edit 2:
Another question you have (in the comments) is then how a client, with the required no knowledge of the server's URIs is supposed to be able to find anything. Clients have the following possibilities to find resources:
Provide a search interface. This could be done by providing an OpenSearch description document, which tells clients how to search for items. An OpenSearch template can include several variables, and several endpoints, depending on what you're looking for. So if you have a "customer ID" that's unique, you could have the following template: /customers/byid/{proprietary:customerid}", the customerid element needs to be documented somewhere, inside the proprietary namespace. A client can then know how to use such a template.
Provide a custom form. This implies making a custom media type in which you explicitly define how (based on an instance of the document) a URI to a customer can be forged. <customers template="/customers/byid/{id}"/>. The documentation (for the media type) would have to state that the template attribute must be interpreted as a relative URI after the string substitution "{id}" to an actual customer ID.
Provide links to all resources. Some resources aren't innumerable, so you can simply make a link to each and every one of them, optionally including identifying information along with the links. This could also be done in a custom media type: <customer id="12345" href="/customer/byid/12345"/>.
It should be noted that #1 and #2 are two ways of saying the same thing: Clients are allowed to create URIs if they
haven't got the URI structure a priori
a media type exists for which the documentation states that URIs should be created
This is much the same way as a web browser has no idea of any URI structure on the web, except for the rules laid out in the definition of HTML forms, to add a ? and then all the query parameters separated by &.
In theory, if you have a customer with id 12345, then you could actually dispense with the href, since you could plug the customer id 12345 into #1 or #2. It's more common to actually provide real links between resources, rather than always relying on lookup or search techniques.
I haven't really used web RPC systems (WSDL/Soap), but i think the 'interface document' is there mostly to allow client libraries to create the service API, right? if so, REST shouldn't need it, because the verbs are already defined and don't really need to be documented again.
AFAIUI, the REST way is to document the structure of each resource (usually encoded in XML or JSON). In that document, you'll also have to document the relationship between those resources. In my case, a resource is often a container of other resources (sometimes more than one type), therefore the structure doc specifies what field holds a list of URLs pointing to the contained resources. Ideally, only one unique resource will need a single, fixed (documented) URL. everithing else follows from there.
The URL 'style' is meaningless to the client, since it shouldn't 'construct' an URL. Every URL it needs should be already constructed on a resource field. That let's you change the URL structure without changing the client (that has saved tons of time to me). Your URLs can be as opaque or as descriptive as you like. (personally, i don't like text keys or slugs; my keys are all BIGINTs or UUIDs)
I am currently building a REST "agent" that addresses the first part of your question. The agent offers a temporary bookmarking service. The client code that is interacting with the agent can request that an URL be bookmarked using some identifier. If the client code needs to retrieve that representation again, it simply asks the agent for the url that corresponds to the saved bookmark and then navigates to that bookmark. Currently those bookmarks are not persisted so they only last for the lifetime of the client application, but I have found it a useful mechanism for accessing commonly used resources. E.g. The root representation provides a login link. I bookmark that link and if the client ever receives a 401 then I can redirect to the "login" bookmark.
To address an issue you mentioned in a comment, the agent also has the ability to store retrieved representations in a dictionary. If it becomes necessary to aggregate and manipulate multiple representations at the same time then I can simply request that the agent store the current representation in a dictionary associated to a key and then continue navigating to the next resource. Once the client has accumulated all the necessary representation it can do what it needs to do.

Detail question on REST URLs

This is one of those little detail (and possibly religious) questions. Let's assume we're constructing a REST architecture, and for definiteness lets assume the service needs three parameters, x, y, and z. Reading the various works about REST, it would seem that this should be expressed as a URI like
http://myservice.example.com/service/ x / y / z
Having written a lot of CGIs in the past, it seems about as natural to express this
http://myservice.example.com/service?x=val,y=val,z=val
Is there any particular reason to prefer the all-slashes form?
The reason is small but here it is.
Cool URI's Don't Change.
The http://myservice.example.com/resource/x/y/z/ form makes a claim in front of God and everybody that this is the path to a specific resource.
Note that I changed the name. There may be a service involved, but the REST principle is that you're describing a specific web resource, named /x/y/z/.
The http://myservice.example.com/service?x=val,y=val,z=val form doesn't make as strong a claim. It says there's a piece of code named service that will try to do some sort of query. No guarantees.
Query parameters are rarely "cool". Take a look at the Google Chart API. Should that use a /full/path/notation for all of the fields? Would each URL be cool if it did?
Query parameters are useful. Optional fields can be omitted. New keys can be added to support new functionality. Over time, old fields can be deprecated and removed. Doing this is clumsier with a /path/notation .
Quoting from http://www.xml.com/pub/a/2004/08/11/rest.html
URI Opacity [BP]
The creator of a URI decides the encoding
of the URI, and users should not derive
metadata from the URI itself. URI opacity
only applies to the path of a URI. The
query string and fragment have special
meaning that can be understood by users.
There must be a shared vocabulary between
a service and its consumers.
This sounds like query strings are what you want.
One downside to query strings is that the are unordered. The GET ending with "?x=1&y=2" is different than that ending with "?y=2&x=1". This means the browser and any other intermediate systems won't be able to cache it, because caching is done based on the full URL. If this is a concern, then generate the query string in a well-defined order.
While constructing URIs this is the priniciple I follow. I don't know whether it is perfectly acceptable in all cases
Say for instance, that I have to get the details of an employee, then the URI will be of the form:
GET /employees/1/ and not GET /employees?id=1 since I treat every employee as a resource and the whole URI "employees/{id}" is used in identification of the resource.
On the other hand, if I have algorithmic operations that do not identify a specific resource as such,but merely require inputs to the algorithm which in turn identify the resource, then I use query strings.
For instance GET /employees?empname='%Bob%'&maxResults=100 might give me all employees whose names have the word Bob in them, with the maximum results returned by the query limited to 100.
Hope this answers your question
URIs are strictly split into a hierarchical part (the path) and a non-hierarchical path (the query), and both serve to identify the resource
Tthe URI spec itself (RFC 3986) clearly sets the path and the query portion of a URI as equal.
Section 3.3:
The path component contains data [...] that along with [the] query component
serves to identify a resource.
Section 3.4:
The query component contains [...] data that, along with
[...] the path component serves to identify a resource
So your choice in using x/y/z versus x=val&y=val&z=val has mainly to do if x, y or z are hierarchical in nature or if they're non-hierarchical, and if you can perceive them as always being hierarchical or non-hierarchical for the foreseeable future, along with any technical limitations you might be having on selecting one over the other.
But to answer your question, as others have noted: Neither is more RESTful than the other, since they both end up identifying a resource.
If the resource is the service, independent of parameters, it should be
http://myservice.example.com/service?x=val&y=val&z=val
This is a GET query. One of the principles behind REST is that you GET to read (but not modify!) the resource; you can POST to modify a resource & get a response; you can PUT to write to a resource; and you can DELETE to remove a resource.
If the resource specific with those parameters is a persistent resource, it needs a name. You could (if you organized your webservice this way) POST to http://myservice.example.com/service?x=val&y=val&z=val to create a particular instance of the service and have it return an ID to name this instance, e.g.
http://myservice.example.com/service/12312549
then use GET/POST/PUT/DELETE to interact with that instance.
First of all, defining URIs as part of your API violates a constraint of the REST architecture. You cannot do that and call your API RESTful.
Secondly, the reason query parameters are bad for non-query resource access is that they are generally not cached. It is also a violation of HTTP standards.
A URL with slashes like /x/y/z/ would impose a hierarchy and is not suited for the exact case of just passing three parameters.
If, like you said, x y z are indeed just parameters and the order is not important, it would be more RESTful to use semicolons:
http://myservice.example.com/service/x;y;z/
If your "service" however is just an algorithm that works the same with different parameters, there would also be nothing unRESTful with using ?x=val format.