How to require a specific character set in Spring WS?

How to require a specific character set in Spring WS? - web-services

We have a SOAP Web Service using Spring WS. We would like to have all requests with a certain encoding/character set. The reasons we would like to have only one encoding/character set are:
the encoding/character set fits our market perfectly
accepting full Unicode would introduce a lot of scripts which don't make sense in our market and we can't handle in a meaningful way
supporting full Unicode correctly would require a lot of work in terms of normalization, ordering, sorting, searching and so on
Ideally we would like to check the request in one central place instead of having to check each string individually.
Is there a better way to implement this other than a servlet filter? We're not that happy with a servlet filter because Spring WS seems to operate on the input stream of the request which may ignore the encoding in the header. Which means we would have to peek into the body.

Related

WSDL-like code-generation for EbXML CPA+Schemas (not EbXML message itself)

Background:
A certain government-backed wholesaler of broadband services in Australia took feedback from discussion groups about how best to deliver B2B services to retail ISPs. They settled on EbXML.
Problem:
We're a very small shop (comparatively) that doesn't want to spend a lot of time going forward on integration. We're already familiar with integration of paired (inbound and outbound) SOAP services. In the past we've made use of WSDL-based code generation tooling (mostly with RPC/Literal services) where the WSDL has been descriptive and simple enough for the code generation tools to digest.
If at all possible we'd like to avoid having to hand-integrate the services with our business 'stack'. We know that the 'Interface Schemas' have been updated several times; we'd like to (as much as possible) do code and schema generation such that we can model our relationship with the supplier and the outbound/inbound messages as simple "queues" (tables) in an SQL database -- this will be our point of integration.
Starting with the outbound ("sender") SOAP web-service... it publishes a Document/Literal WSDL description of the service that seems to work correctly with various tools (e.g: wsdl2java, SoapUI) to generate the EBXML 'wrapper' messages. This says nothing about the 'payload' messages themselves which (at least for the MSH we've looked at) need to be multipart/related attachments with type of text/xml.
The 'payload' messages are defined in the provided CPA (something like bindings) and Schema (standard-looking XSD) files. The MSH itself doesn't seem to provide any external validation for the payload messages.
Question:
Is the same kind of code generation (as seen with WSDL-described SOAP web services) tooling available for EbXML CPAs/Schemas? (i.e: tools that can consume the CPA and 'payload' interface schemas and spit out java/c++/whatever, and/or something WSDL-like specific to the 'payload' interface messages and/or example messages).
If so, where do I look?
If not, are there any EbXML-specific problems that would prevent it? (I'd rather not get several weeks into a project to develop tools that are impossible to implement 'correctly' given the information at hand).

The MSH is payload agnostic. The payloads are not defined in the CPA, only the service and action names that are used to send the ebXML payloads are. The service and action are transmitted in the ebXML header, which is the first part of the multipart message. The payloads themselves can be xml, binary or a combination. Each payload is another part.
An MSH is responsible for tasks like:
sending (usually asynchronous) acknowledgements for received messages
resending messages if an acknowledgement has not been received within a certain amount of time
ignoring duplicate messages
assuring the order in which messages are delivered is correct
the actual behaviour is all configurable using the CPA, but a compliant MSH would support all this.
This implies that an MSH has to keep an administration of the messages it has sent and received, which is usually done in a database.
I would be surprised if you could find tooling to generate an MSH from a specific CPA. What you can find is software/components that implement a generic MSH and that can be configured with CPAs.
Assuming you don't want to build your own, look for an existing ebMS adapter. Configure it with your CPA(s). Then generate the payloads however you like and pass them to the ebMS adapter.
Google for "ebMS adapter" or "ebMS support".

Alas, it seems there's no specific tooling around the 'payload' messages for EbXML, spefically because EbXML doesn't regulate those messages.
However, the CPA (through canSend and canRecv) elements acts somewhat like a SOAP WSDL, and the XSDs serve the same purpose as with SOAP, so it's not too far off.
There does exist software for turning types defined in XSDs into messages (merging in user-supplied data) at runtime, but per my question there's no obvious tooling for code generation around CPAs and related XSDs.
Furthermore, actually writing software to do this yourself is made more problematic by the dificulty of searching for the meta-grammar for XML Schema (i.e: that grammar which remains of XML Schema once XML tokenization is factored out). Basically, this was difficult because in the XML world, the word "grammar" has an different meaning which polutes search results.
I was able to write a parser for the XML syntax snippets present at the top of each of the MSDN articals on XML Schema (elements listed down the left), which in turn allowed me to generate an LL1 grammar for XML schema which works on the pre-parsed AST of a given XSD.
From there I built a top-down parser from this meta-grammar which:
Follows <xsd:import>s and <xsd:include>s to resolve namespaces into further XSDs.
Recursively resolves message types in order to produce a 'flattened' type for each CPA message.
Generates a packer/unpacker data-structures for the message types which allow generation of code in various languages, as well as serialisation to and parsing from validated 'payload' XML.
There are still various XML Schema restrictions, keys, and other constraints that my code generators don't know about, but support for these can be added in time.
I'll update this answer with links to grammars (and possibly code -- depends on legals) as time permits. I'll leave the question as non-accepted for a while so that if someone miraculously finds a tool which makes much less work of the code generation, I'll accept an answer based on that.

Restful API - handling large amounts of data

I have written my own Restful API and am wondering about the best way to deal with large amounts of records returned from the API.
For example, if I use GET method to myapi.co.uk/messages/ this will bring back the XML for all message records, which in some cases could be 1000's. This makes using the API very sluggish.
Can anyone suggest the best way of dealing with this? Is it standard to return results in batches and to specify batch size in the request?

You can change your API to include additional parameters to limit the scope of data returned by your application.
For instance, you could add limit and offset parameters to fetch just a little part. This is how pagination can be done in accordance with REST. A request like this would result in fetching 10 resources from the messages collection, from 21st to 30th. This way you can ask for a specific portion of a huge data set:
myapi.co.uk/messages?limit=10&offset=20
Another way to decrease the payload would be to only ask for certain parts of your resources' representation. Here's how facebook does it:
/joe.smith/friends?fields=id,name,picture
Remember that while using either of these methods, you have to provide a way for the client to discover each of the resources. You can't assume they'll just look at the parameters and start changing them in search of data. That would be a violation of the REST paradigm. Provide them with the necessary hyperlinks to avoid it.
I strongly recommend viewing this presentation on RESTful API design by apigee (the screencast is called "Teach a Dog to REST"). Good practices and neat ideas to approach everyday problems are discussed there.
EDIT: The video has been updated a number of times since I posted this answer, you can check out the 3rd edition from January 2013

There are different ways in general by which one can improve the API performance including for large API sizes. Each of these topics can be explored in depth.
Reduce Size Pagination
Organizing Using Hypermedia
Exactly What a User Need With Schema Filtering
Defining Specific Responses Using The Prefer Header
Using Caching To Make Response
More Efficient More Efficiency Through Compression
Breaking Things Down With Chunked Responses
Switch To Providing More Streaming Responses
Moving Forward With HTTP/2
Source: https://apievangelist.com/2018/04/20/delivering-large-api-responses-as-efficiently-as-possible/

if you are using .net core
you have to try this magic package
Microsoft.AspNetCore.ResponseCompression
then use this line in configureservices in startup file
services.AddResponseCompression();
then in configure function
app.UseResponseCompression();

REST vs RPC for a C++ API

I am writing a C++ API which is to be used as a web service. The functions in the API take in images/path_to_images as input parameters, process them, and give a different set of images/paths_to_images as outputs. I was thinking of implementing a REST interface to enable developers to use this API for their projects (independent of whatever language they'd like to work in). But, I understand REST is good only when you have a collection of data that you want to query or manipulate, which is not exactly the case here.
[The collection I have is of different functions that manipulate the supplied data.]
So, is it better for me to implement an RPC interface for this, or can this be done using REST itself?

Like lcfseth, I would also go for REST. REST is indeed resource-based and, in your case, you might consider that there's no resource to deal with. However, that's not exactly true, the image converter in your system is the resource. You POST images to it and it returns new images. So I'd simply create a URL such as:
POST http://example.com/image-converter
You POST images to it and it returns some array with the path to the new images.
Potentially, you could also have:
GET http://example.com/image-converter
which could tell you about the status of the image conversion (assuming it is a time consuming process).
The advantage of doing it like that is that you are re-using HTTP verbs that developers are familiar with, the interface is almost self-documenting (though of course you still need to document the format accepted and returned by the POST call). With RPC, you would have to define new verbs and document them.

REST use common operation GET,POST,DELETE,HEAD,PUT. As you can imagine, this is very data oriented. However there is no restriction on the data type and no restriction on the size of the data (none I'm aware of anyway).
So it's possible to use it in almost every context (including sending binary data). One of the advantages of REST is that web browser understand REST and your user won't need to have a dedicated application to send requests.
RPC presents more possibilities and can also be used. You can define custom operations for example.
Not sure you need that much power given what you intend to do.
Personally I would go with REST.
Here's a link you might wanna read:
http://www.sitepen.com/blog/2008/03/25/rest-and-rpc-relationship/

Compared to RPC, REST's(json style interface) is lightweight, it's easy for API user to use. RPC(soap/xml) seems complex and heavy.
I guess that what you want is HTTP+JSON based API, not the REST API that claimed by the REST author
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven

How to implement backend of api with multiple versions

I'm using Django to implement a private rest-like API and I'm unsure of how to handle different versions of the API on the backend.
Meaning, if I have 2 versions of the API what does my code look like? Should I have different apps that handle different version? Should different functions handle different versions? Or should I just use if statements for when one version differs from another?
I plan on stating the version in the Header.
Thanks

You do not need to version REST APIs. With REST, versioning happens at runtime either through what one might call 'must-ignore payload extension rules' or through content negotiation.
'must-ignore payload extension rules' refer to an aspect you build into the design of your messages. 'Must-ignore' means that a piece of software that processes a message of the given format must ignore any unknown syntactical constructs. This is what we all know from HTML and what makes it possible to insert all sorts of fancy tags into an HTML page without the parser choking.
'Must-ignore' allows you to evolve the capabilities of your service by adding stuff to what you send already without considering clients that only understand the older versions.
Content-negotiation refers to the HTTP-built-in mechanism of negotiating the actual representation the server sends to a given client at runtime. The typical scenario is this: Clients send the Accept header in the request to advertise what they are capable of and servers pick the representation to send back based on these capabilities. But there are also variations of this theme (see here for details: http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html ).
Content negotiation allows for incompatible changes, meaning that I can evolve my service to being able to send incompatible old and new versions and based on the Accept header my service will send the appropriate one.
Bottom line: with both approaches, your API remains as it is. No need to do any versioning at the API level - especially not the often suggested (but totally wrong) inclusion of version identifiers in the URIs (remember, you are doing REST here, not SOAP!)

Why is there no midpoint between "do_GET" and full-blown rails-style routing?

I'm looking to expose a number of services to the web. There will be static web pages with jQuery based JavaScript code that accesses these services, and there will also be all kinds of applications that may access these services. (Or nobody will care; that's also quite possible :-)
Each service would be well defined as a collection of methods that act on some number of input parameters, and return some number of output parameters. Most of it is REST, except for the concept of "identity" -- these services requires some log-in, and logging in does set you up with permissions on which methods on which services you're allowed to use, and what particular entities you're allowed to address using those methods.
Ideally, I want to expose the services using JSONP to make the services easy to consume in a cross-site way -- those static web applications shouldn't all have to be be served from the domain of my application servers.
The set of data types is fairly basic -- varchar (255 chars), text (8191 chars), id (32 chars, C-style identifier constraint), double, long (hard to do in JavaScript), bool, datetime, email (varchar matching a reges) and url (varchar matching a regex) would probably suffice for a very long time.
I want, if possible, to implement these services using an application service technology that can scale on multi-threaded cores -- 24 threads is standard on plain mid-range servers these days. Sticking with Python or Node.js would make me uneasy, because of their lack of threading support. Also, I would like for typing to be static, because I believe static typing prevents a certain class of bugs, which also talks against Python, Node.js, PHP and Ruby.
I also want to serve on Linux. That's a pretty hard requirement :-)
In the static language world, there are really only a few server frameworks or framework approaches. There's HttpListener with Mono. There's Jetty with Java. There's a few others. There are also a few much deeper frameworks, that have more overhead than I'd like -- J2EE, ASP.NET, etc. (In the dynamic world, you have Cake.PHP, Rails, Django, etc etc)
So, in the best of worlds, I'd like the GET URL /foo/bar?arg1=2&arg2=xyzzy to map to an object that I write of type foo, method bar, taking arguments arg1 and arg2 of type int and string (say). I want the server glue that sits between the HTTP GET and the object method to do two things:
1) Permission control (each method could have one or more required permissions)
2) Type checking (each argument should be verified against expected and coerced from the URL string format to the native typ)
Once the method returns one or more values (key/value pairs), the glue should make sure that the return values are also according to spec, convert to proper JSON, and should take care of returning an appropriate error result if an exception was thrown within the code.
I've been looking high and low for this kind of solution, but all solutions I find are pretty far off. JSP and ASP all start out with a HTML assumption -- I'm generally generating JSON, which doesn't integrate as well with the syntax (to put it mildly). JSPX, as well as a number of other technologies (HttpListener, CGI, Python Twisted, ...) all stop at the "do_GET" level -- no dispatch into objects, no permission control, no type checking. All the higher-level frameworks, however, add a lot more on top of that, and often adds complex routing that's not something I need -- and as often as not, still don't do permission checking, but instead leaves that as something you have to write manually in each function implementation.
I think the closest I could find to what I want is Thrift. However, it still doesn't do permission checking, and the "PHP server" support it has seems to be a dumb PHP cli process listening on port 80 instead of integrating with Apache, and it's not set up to support JSONP.
Have I missed something? Is there some (preferrably statically typed and multi-thread-supporting) server technology that will do type checking and permission checking and simple object-method dispatch, without tons of other cruft to get in the way, and can be called from (and respond to) JSONP?
Should I extend Thrift? Adding permission constraints on each method would be a fairly substantial extension, but at least I'd get the other support that Thrift has. (And then I'd have to add JSONP support, and ...)

Is there some (preferrably statically typed and multi-thread-supporting) server technology that will do type checking and permission checking and simple object-method dispatch, without tons of other cruft to get in the way
Scala (static, actors for threading/concurrency & dispatch, no cruft) + Thrift.
I believe the latest thrift has javascript support (not sure about JSONP).
As for authentication, see my answer to this question.

There really doesn't exist any good framework at the right level for this. Existing frameworks are either too low level (think boost::asio), or too high level (think cake, rails, etc). The closest option is probably Erlang/WebMachine. I guess the reason for this is that most web apps end up having a common set of higher-level requirements that then end up migrating into whatever the web framework is, thus lifting it out of the level where I'm looking to go.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js