Why perform transformations in middleware? - xslt

A remote system sends a message via middleware (MQ) to my application.
In middleware a transformation (using xslt) is applied to this message. It is just reformatted and there is no enrichment nor validation. My system is the only consumer of this transformed message and the xslt is maintained by my team.
The original author of all of this has long gone and I am wondering why he thought it was a good idea to do the transformation in middleware rather than in my app. I can't see the value in moving this to middleware, it makes it less visible and less simple to maintain.
Also I would have thought that the xslt would be maintained by the message producer not the consumer.
Are there any guidelines for this sort of architecture? Has he done the right thing here?

It is a bad idea to modify a message body in the middleware. This negatively affects the maintainability and performance.
The only reason of doing this is trying to connect two incompatible endpoints without modifying them. This would require the transformation of the source content to be understood by the destination endpoint.
The motivation to delegate middleware to perform transformation could be a political one (endpoints are maintained by different teams, management is reluctant to touch the endpoint code, etc.).

If you are trying to create an application architecture where there is a need to serve data to different users in different formats, and perhaps receive data in different formats (think weather reports, or sports news), then creating a hub capable of doing the transformations between many different formats makes excellent sense. (Whether you call that "middleware" is up to you.) Perhaps your predecessor had this kind of architecture in mind, but it never grew big or complex enough to justify the design.

From a architectural point of view, it's a good idea to provide consumers of messages or content that is in a humanly readable format, e.g. xslt, unless there is a significant performance gain in using a binary format.
In the humanly readable format case, one simply has to look at the message to verify that it is correct. In the binary case, one would have to develop a utility to tranform binary message into a humanly readable form. Different implementers of such a utility may not always interpret the binary form as intended and it may turn into a finger pointing exercise as to who or what is correct.
Also, if one is looking at what's in the queue, it is easier to make sense of it if the messages are in a humanly readable format.
It doesn't hurt to start with humanly readable format and get the app working first. Then profile the app and see if in the big picture the transformation routines are significant sources of delay. If yes, then go to a binary format.
It would have been preferable to have the original message producer provide messages in xslt format, but they must have had good reasons for doing what they did when they did it. E.g potentially other consumers, xslt didn't exist then, resource constraints, etc.

Read about the adaptor design pattern and you will understand the intent of the current system architecture.

Related

Functionalities structuring in API design

By 'functionalities structuring', I mean how we organize and coordinate different API endpoints to offer desired functionalities to clients. The context here is web APIs for consumption by mobile phones with GPS tracking, and I assume either cellular or WiFi connectivity is required for most functionalities.
I personally prefer a more 'modular' approach where each endpoint does mostly one thing and a collection of them fulfill all the requirements. Of course, you may need to combine some subset or sequence of these endpoints to achieve certain functionalities. Overall, I try to minimize the overlapping between endpoints in terms of both computation and functionalities.
On the other hand, I know some other people prefer client-side convenience (or simplicity) over modularity in the following ways:
If the client needs to achieve a functionality, then there should exist a single API endpoint which does exactly that, such that the client needs only a single request to fulfill the functionality with minimal caching/logic in between requests.
For GET endpoints, if there are multiple levels/kinds of data involved for some functionalities, they prefer as much data as possible (often all necessary data) returned by a single endpoint. Ironically, they may also want a dedicated endpoint for retrieving only the "lowest level" data using a corresponding "highest level" ID. For example, If A corresponds to a collection of Bs, and each B corresponds to a collection of Cs, then they will prefer a direct endpoint that retrieves all the relevant Cs given an A.
In some extreme cases, they will ask for a single endpoint with ambiguous naming (e.g. /api/data) that returns related data from different underlying DB tables (in other words, different resources) based on different combinations of query string parameters.
I understand that people preferring such conveniences above aim to: 1. reduce the number of API requests necessary to fulfill functionalities; 2. minimize data caching and data logic on the client side to reduce client complexity, which arguably leads to a 'simple' client with simplified interaction with the server.
However, I also wonder if the cost of doing so is unjustifiable in other aspects in the long run, especially in terms of the performance and the maintenance of the server-side API. Hence my questions:
What are the tried-and-true guidelines for structuring API functionalities?
How do we determine an optimal number of requests necessary for fulfilling a functionality in a mobile app? Of course, if all other things equal, a single request is the best, but achieving such a single-request implementation usually carries penalty in other aspects.
Given the contention between the number of client requests and the performance and maintainability of server-side API, what are the approaches for striking a balance in order to deliver a sensible design?
What you are asking about breaks into at least three main areas of API design:
Ontology Design (organization)
Request/Response Design (complexity/performance)
Maintenance Considerations
Based on my experience (which is largely from working with very large organizations both on the API producing and consuming side and talking with hundreds of developers on the topic), let's look at each area, addressing the specific points you bring up...
Ontology Design
There are a couple of things to take in to consideration in your design that are perhaps implied when you say:
Overall, I try to minimize the overlapping between endpoints in terms of both computation and functionalities.
This approach makes the APIs easily discoverable. When you are in a situation where you are publishing APIs for consumption by other developers who you may or may not know (and may or may not have enough resources to truly support), this kind of modularity - making them easy to find and learn about - creates a different kind of "convenience" leading to easier adoption and reuse of your APIs.
I know some other people much prefer convenience over modularity: 1. if the client needs a functionality, then there should exist a single endpoint in the API which does exactly that...
The best public example that comes to mind for this approach is perhaps the Google Analytics Core Reporting API. They implement a series of querystring parameters to build a call that returns the data requested, ex:
https://www.googleapis.com/analytics/v3/data/ga
?ids=ga:12134
&dimensions=ga:browser
&metrics=ga:pageviews
&filters=ga:browser%3D~%5EFirefox
&start-date=2007-01-01
&end-date=2007-12-31
In that example we are querying Google Analytics Account 12134 for pageviews by browser where broswer is Firefox for the given date range.
Given the number of metrics, dimensions, filters, and segments their API exposes, they have a tool called the Dimensions & Metrics Explorer to help developers understand how to use the APIs.
One approach makes the APIs discoverable and more understandable from the outset. The other requires more supporting work to explain the intricacies of consuming the API. One thing that isn't immediately obvious with the Google API above is that certain segments and metrics are incompatible, so if you are making calls passing one key/value pair, you may not longer be able to pass certain other pairs.
Request/Response Design
The context here is APIs for mobile applications.
That is still very broad, and better defining (if possible) how you intend for your "mobile applications" to be used can help you design your APIs.
Do you intend for them to be used totally offline? If so, heavy/complete data caching may be desirable.
Do you intend for them to be used in low bandwidth and/or high latency/error-rate connectivity scenarios? If so, heavy/complete data caching may be desirable, but so might small/discrete data requests.
for GET endpoints, they often prefer as much data as possible returned by a single endpoint, especially when there are multiple levels/layers of data involved
This is safe if you know you'll only ever be in good mobile connectivity scenarios, or you can cache the data heavily when you are (and thus access it offline or when things are spotty).
I understand that people preferring convenience aim to reduce the number of API calls necessary to achieve functionalities...
One way to find a happy middle ground is to implement paging in your data-intensive calls. For example, a querystring can be passed in a GET specifying 'pagesize'. Thus 10,000 records could be returned 100 at a time over 100 successive calls, or 1,000 at a time over 10 calls.
With this approach, you can design and publish your API without necessarily knowing what your consuming developer will need. Even though the paging example above uses the Google API referenced earlier, it can still be used in a more semantically designed API. For example, say you have GET /customer/phonecalls you could still design it to accept a pagesize value and make successive calls to get all the phonecalls associated with customer.
Maintenance
I also wonder if the cost of doing so [reduce the number of API calls necessary to achieve functionalities and to minimize data caching] is not justifiable in the long run, especially for the performance and the maintenance of an API.
The key guiding principle here is separation of concerns if your collection of APIs is going to grow to any significant level of complexity and scale.
What happens when you have everything bundled together into one big service and a small part of it changes? You are now creating not only a maintenance headache on your side, but also for your API consumer.
Did that "breaking change" really affect the part of the API they were using? It will take time and energy for them to figure that out. Designing API functionality into discrete, semantic services will let you create a roadmap and version them in a more understandable way.
For further reading, I'd suggest checking out Martin Fowler's writings on Microservices Architecture:
In short, the microservice architectural style is an approach to
developing a single application as a suite of small services, each
running in its own process and communicating with lightweight
mechanisms
Although there is a lot of debate about how to design and build for "microservices" in practice, reading up on that should help further shape your thinking on the API design decisions you're facing and prepare you to engage in "current" discussions around the topic.

Twisted Django Comet(Orbited): the interaction of upper and middle level

I'm developing a monitoring system(something like real-time web app). And the question is about system architecture.
Device connects to server and sends information about controlled parameters state. Sever should save information to Database and notify Comet server. Comet server sends message to user saying that new data avaliable. User gets new information.
What's the best way to analyze and save information(create alarm messages if needed) about device state:
Twisted app it self analyzes and interacts with DB(adbapi) and Comet server(Orbited).
Twisted pushes received data to Django(how to push?) and Django analyzes and saves data to DB and sends "NEW" flag to orbited.
Any Your suggestions, if there is a better way.
More information you can find on the pictures below:
This question is fairly open ended. Someone could probably write a dozen pages on each of the options you described, and that much again on a handful of other approaches as a bonus.
Instead of doing that, I'll take an alternate route.
Make sure you have a good understanding of your requirements. Think about which approach is going to be easiest for you (or for the developers on your team) to satisfy those requirements. Take that approach, documenting the overall idea and unit testing everything you write (preferably using TDD).
When you're done, you might not have the optimal solution, but you'll have a solution, and 99 times out of 100 that's indistinguishable from being optimal.
If I do think about your proposed approaches a little bit, then what mostly occurs to me is that they don't differ from each other very much. Your analysis is just some Python code somewhere that you're going to invoke. Whether you invoke it closer to some Twisted-using code or closer to some Django-using code doesn't seem to make a huge difference to the outcome. Perhaps some part of your requirements would make one approach better than the other. However, if you have unit tests and understand your requirements, then I expect you'll actually find it quite easy to switch between those two approaches.
After you've implemented something, you'll have a much better understanding of the trade-offs involved and you'll be in a better position to decide if one implementation is going to work better or worse than another.
Note that unit tests are a pretty essential part of this idea. Without them, you won't really know if you've implemented your requirements, you won't know if your functionality still works after any particular refactoring, and refactoring itself will be harder because your units will not be as well-defined and isolated as they would be if you were doing test-driven development.

Good approaches for processing xml in C++

I work on a multithreaded message processing application written in C++. The application receives xml messages, performs some action, and may publish out an xml message to another service if required.
Currently, the app works by extracting data while parsing the message and performing some action on that message in the middle of parsing. This seems like poor practice to me. I have the opportunity to create an alternative, and I'm considering approaches I can use.
One method I've thought of is to serialize the xml data into a data object, and once that is finished, extract and process data as needed. The disadvantage would be that I have to build a new class for each different xml message I process (probably around 30), but that approach seems cleaner than what I have now.
Is there a better way than this? Should also mention the caveat that any code libraries developed outside the U.S. are unlikely to be approved.
Currently, the app works
Then what exactly are you fixing?
Don't fix what isn't broken.
There are typically two approaches to XML parsing: DOM and SAX. DOM builds up a document object model (like what you are proposing), whereas SAX invokes callbacks as parts of the document are visited during parsing. The free, well-known libxml2 library supports both parsing methods.
Typically, the SAX approach (i.e., using callbacks that get executed as the document is visited), uses less memory and can result in lower end-user latency, because you can start processing immediately, instead of having to wait for the entire document to have been parsed and built up.
The fact that your program is multithreaded is a red-herring. As long as you always pass an object to each of your callbacks, and that object is not shared between threads, you can safely do this with multiple different such objects in multiple different threads. Using a standard library such as libxml2 to do your parsing is also sensible from a reuse perspective.
There were probably some design decisions that were made which led to this approach (say for example, it's faster to process using a SAX like model than a DOM like model), with the latter you need to parse the entire message, with the former you can make decisions as you are called back with data.
I'd try to understand these first before making any changes, secondly aside from keeping you busy, is there a real business need for it? If not, move on and do something else...

Common information model for SOA systems

We are looking at the possibility of implementing a Common Information Model for data across several systems in a SOA architecture.
Many of these services will be consumed by a composite UI, we therefore see a benefit in having common data types.
What we are wondering is if this is a feasible approach, or if we should just map to common types in the client?
This question is framed pretty broadly, so my answer is going to remain pretty broad as well.
The key consideration here would seem to be location independence - though you're working with several applications, they're all going to share certain sorts of data (though not, as far as I can see from your question, actual data). An obvious use case for this is authentication and authorization data.
If you have determined that the common data is truly cooked enough to isolate in the fashion you're describing then I think it makes perfect sense to layer it off into a service. I think the perfect example of this is Windows Identity Framework. It takes something that we as architects have always treated as data and turns it into a service.
What you lose with the location independence is a little bit of efficiency that you would otherwise have in making batches calls to the same server, though SOA applications lose this efficiency early in their design, in my experience. But the efficiency you gain from "patternizing" a section of your apps generally outweighs that enormously.
Having a common information model doesn't imply common data types or common classes. Simply defining the relationships between, for instance, Customer, Order, OrderItem and Product goes a great distance toward common business logic and the ability to have different services and applications be able to interoperate in an SOA environment.
You might consider having an actual common model in some modeling language. From this, concrete data types and classes could be generated for particular circumstances. One might use UML for this, but I personally prefer to use NORMA, an Object-Role Modeling tool. It works at the conceptual level, so creates models that are independent of the data store technology.
NORMA runs as an add-in to Visual Studio Standard edition or above, but out of the box generates artifacts for several databases, as well as LINQ to SQL classes and even PHP web services, all from the same model. It is extensible so that you can generate your own artifacts from the model. And of course, the model is represented as XML, so you can do whatever you like with it.

Relational databases application [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
When developing an application which mostly interacts with a database, what is a good way to start? The application requires a lot of filtering based on user input, sorting and structuring.
The best way to start is by figuring out "user stories" (or "use cases" -- but the "story" approach tends to really work great and start dragging shareholder into the shared storytelling...!-); on top of that, designing the database schema as the best-normalized idea you can find to satisfy all data layer needs of the user stories.
Thirdly, you may sketch layers such as views on top of the schema; fourthly, and optionally, triggers and stored procedures that might live in the DB to ensure consistency and ease of use for higher layers (but, no matter how strongly DBAs will push you towards those, don't accept their assurances that they're a MUST: they aren't -- if your storage layer is well designed in terms of normalization and maybe useful views on top, non-storage-layer functionality CAN always reside elsewhere, it's an issue of convenience and performance, NOT logical consistency, completeness, correctness).
I think the business layer and user-experience layers should come after. I realize that's a controversial position, but my point is that the user stories (and implied business-rules that come with them) have ALREADY told you a LOT about the business and user layers -- so, "nailing down" (relatively speaking -- agility and "embrace change!" should always rule;-) the data storage layer is the next order of business, and refining ("drilling down") the higher layers can and should come after.
When you get to the database layer you'll want to handle the database access via stored procedures. This will help give you additional protection against SQL Injection attacks, and make it much easier to push logic changes to the database layer.
If it's mostly users interacting with data, you can design using a form perspective.
What forms are needed for user input?
What forms are needed for output reports?
Once you've determined that, the use of the forms will dictate the business logic needed to be coded behind the scenes. You'll take the inputs, create the set of procedures or methods to deal with them, and output what is necessary. Once you know the inputs and outputs, you will be able to easily design the necessary functions.
The scope of the question is very broad. You are expecting me to tell what to do. I can only do a good job of telling how to do things. Do investigate upon using Hibernate/Spring. Since most of your operations looks like querying db, hibernate should help. Make sure the tables are sufficiently indexed so your queries can run faster if filtered based on index fields. The challenging task is design your DB layer which will be the glue between your application and db. Design your db layer generic enough so that it can build queries based on the params that you pass to it. Then move on to develop the above presentation layer. Developing your application layer by layer helps since it will force you to decouple the db logic from the presentation logic. When you develop the db layer, assume that not just your presentation layer but any client can call it. This will help you to design applications that can be scalable and adaptable to new requirements.
So bottom line : Start with DB, DB integeration layer, Controller and last Presentation Layer.
For the purpose of discussion, I'm going to assume that you are working with a starting application that doesn't have a pre-existing database. If this is false, I'd probably move the order of steps around quite a bit.
1 - Understand the Universe
First, you've got to get a sense of what's around you so you can really understand the problem that you are trying to solve.
User stories or use cases are often a good starting point. Starting with what tasks the user will try to do, and evaluating how frequently they are likely to be is a great starting point. I like to start with screen mockups as well, with or without lots of hands on time with users, I find that having a screen gives our team something really finite to argue about.
What other tools exist in this sphere? These days, it seems to me that users never use just one tool, they swap around alot. You need to know two main things about the other tools you users use:
(1) - what will they be using as part of the process, along side your tool? Consider direct input/output needs - what might they want to cut/copy/paste from or to? What tools might you want to offer file upload/download for with specific formats, what tools are they using alongside your tool that you might want to share terminology, layout, color coding, icons or other GUI elements with. Focus especially on the edges of the tools - a real gotcha I hit in a recent project was emulating the databases of previous tools. It turned out that we had massive database shift, and we would likely have been better starting fresh.
(2) What (if anything) are you replacing or competing with? Steal the good stuff, dump and improve the bad stuff. Asking users is always best. If you can't at least understanding the management initiative is important - is this tool replacing a horrible legacy tool? It may be legacy, but there may be the One True Feature that has kept the tool in business all these years...
At this stage, I find that things are really mushy - there's some screen shots, some writing, some schemas or ICDs - but not a really gelled clue.
2 - Logical Entities
Or at least that's what the OO books call it.
I don't care much for all the writing I see on this task - but I find that any any given system, I have one true diagram that I draw over and over. It's usually about 3-10 boxes, and hopefully less than an exponentially large number of lines connecting them. W
The earlier you can get that diagram the better.
It doesn't matter to me if it's in UML, a database logical model, something older, or on the back of a napkin (as long as the napkin is shrouded in plastic and hung where everyone can see it).
The earlier you can make this diagram correctly, the better.
After the diagram is made, you can start working on the follow on work that may be more official.
I think it's a chicken and egg question on whether you start with your data or you start with your screens and business logic. I know that you certianly want to optimize for database sizing and searchability... but how do you know exactly what your database needs are without screens and interfaces giving you a sense for the data?
In practice, I think this is an ever-churning cycle. You do a little bit everywhere, and then you change it all.
Even if you don't get to do a formal agile lifecycle, I think you're best bet is to view design as agile -- it will take many repetitions and arguments before you really feel it's "right".
The most important thing to keep in mind is that your first, and most likely 2nd 3rd attempt at designing the database will be wrong in some way. That might sound negative, maybe even a little rash, (it's certainly more towards the 'agile' software design philosophy) but it's important thing to keep in mind.
You still need to do your analysis thoroughly of course, try to implement one feature at a time, but try to get all layers working first. That way you won't have to do to much rework when the specs change and you understand the issues better. One you have a lot of data loaded into a system, changing things becomes increasingly difficult.
The main benefit of this approach is you find out quickly where you design is broken, where you haven't separated you design layers correctly. One trick I find extremely useful is to do both a sqllite and a mysql version, so seamless switching between the two is possible. Because the two use a different accent of SQL it highlights where you have too tight a coupling between the layers.
A good start would be to get familiar with Multitier architecture
Then you design your presentation layer.
In your business logic layer implement all logic
And finally you implement your data access layer.
Try to setup a prototype with something that is more productive then C++ for example Ruby, Python and well maybe even PHP.
When the prototype works and you see your data model is okay and your queries are too slow then you can start using C++.
But as your questions suggests you have more options then data and in this case the speed of a scripting langauge should be enough.