Port monolithic server application to web services gotchas - web-services

Has anyone done a port from a monolithic server application to a service and what are the hidden ‘gotchas’ I need to be aware of to accurately estimate the cost of this?
We have an existing single threaded monolithic server application written in Java. It performs fine as it sits, but we want to start extending it. Extending it would mean many more people would use it and the server would not be able to handle the extra load. There was a significant development investment in this code base, and the code base is large. The cost of multithreading the server would be crazy.
I had a harebrained idea about breaking it up into logical service components, removing them from the application and hosting them on Axis2 or Tomcat, and pushing them into a SOA cloud.
I have written many services for Axis2, worked plenty with SOA clouds, and written multiple monolithic servers and it seems straight forward. Eliminate as much shared state as possible – then push that to a DB of some kind, pull out logical services from the monolithic app, repeat until done.
But I have a bad feeling that the devil is in the details and I am certain I am not the first person to have an idea like this.

My experience in these types of system/architecture migrations, the time-sinks and killers tend to be:
Identifying all use-cases and functional requirements. Roughly 50% will not be documented. Out of the 50% that are, 50% will be incorrect. Out of the 50% that are both document and correct, 50% will not be valid anymore.
Ensuring backwards compatibility. Just because you are moving to a new architecture generally doesn't mean that all clients will move with you at the same time.
Migrating existing data into new structure/model/architecture. This always takes a lot longer than you think. Take the worst case scenario you can imagine in terms of time/cost, double it and you'll still be short by about half.
Access control model.
Documenting your services in a clear and useful way. Your shiny new SOA architecture won't be worth squat if no one can use it.
Performance testing. It's mind boggling how often this is skipped or done at the very end of the project. Establish performance scenarios, testing infrastructure and baseline values first thing and then continually test and measure against them. This should be to an architect what unit testing is to a developer.
These are the things that make projects fail, go over budget and over time if you don't address them early in the project.

Related

Clojure Web (HttpKit, Manifold) vs Elixir/Pheonix

I am currently weighing up the use of Elixir vs Clojure for running a web server to handle many concurrent Web socket connections. Now Elixir/Phoenix seems a natural fit for this and you see benchmarks demonstrating how far it scales (I doubt this has an bearing on real life load). But, most of our infrastructure is written in Clojure.
In our case, the websocket handler is almost completely independent from the rest of the existing code base.
So the question is - would you consider taking on another language/ecosystem because it is a better fit for a particular job? vs using a tool that is already a large part of your existing ecosystem.
And, does  Elixir/Phoenix exceed Clojure/JVM by a significant margin under real world loads for this kind of task to justify using it?
You may find this conversation helpful.
In summary, the "speed of your webserver" is not going to be the bottleneck in your development. Coding, testing, debugging, refactoring, and other human-related tasks are more important by a ratio of 100x or 1000x for just about any modern project.
Since you are already comfortable with Clojure and using it for most of your project, I would strongly suggest sticking with it rather than split your codebase into 2 incompatible parts.

Approaches to unit testing big data

Imagine you are designing a system and you want to start writing tests that determine functionality - but also performance and scalability. Are there any techniques you can share for handling large sets of data in different environments?
I would strongly recommend prioritizing functionality tests (using TDD as your development workflow) before working in performance and scalability tests. TDD will ensure your code is well designed and loosely coupled which will make it much, much, easier down the road to create automated performance and scalability. When your code is loosely coupled, you get control over your dependencies. When you have control over your dependencies, you can create any configuration you want for any high level test you want to write.
Do some functionality test. Consider few Risk management techniques refer this post How to handle risk Management in Big data This will help you.
Separate the different types of test.
Functional testing should come first, starting unit tests with small
amounts of mock data.
Next, integration tests, with small amounts of
data in a data store, though obviously not the same instance as the
store with the large data sets.
You maybe able to reduce your development effort by doing performance
and scalability tests together.
One important tip: Your test data set should be as realistic as possible. Use production data, anonymizing as necessary. Because big data performance depends on statistical distribtions in the data, you don't want to use synthetic data. For example, if you use fake user data that basically has the same user info a million times, you will get very different scalability results as opposed to real-life messy user data with a wide distribution of values.
For more specific advice, I'd need to know the technology you're using. In Hadoop, look at MRUnit. For RDBs, DBUnit. Apache Bigtop can provide inspiration, though it is aimed at core projects on Hadoop rather than specific application-level projects.
There are two situations that I encountered:
large HDFS datasets that serve as Data Warehouse or data sink for other applications
Apps with HBASE or other distributed databases
Tips for unit testing in both cases:-
a. Test the different functional components of the app first, there is no special rule for Big Data apps; just like any other app, unit testing should ascertain whether different components on an app are working as expected or not; then you could integrate functions/services/components, etc to do the SIT, if applicable
b. Specifically if HBASE or any other distributed data base is there, please test what is required of the DB. For e.g. distributed data bases often do not support ACID properties like traditional databases and instead are limited by something called CAP theorem (Consistency, Availability, Partition tolerance); usually 2 among 3 are guaranteed. For most RDBMS it is CA, usually for HBASE it is CP and Cassandra AP. As a designer or test planner, you should know, depending on your applications features, which is the CAP constraint for your distributed data base, and accordingly create test plan to check the real situation
Regarding performance - Again a lot depends on the infrastructure and app design. Also sometimes certain s/w implementations are more taxing than others. You might check the amount of partitioning for example, its all case based
Regarding Scalability - the very advantage of a Big data implementation is it is easily scalable compared to a traditional architecture. I have never thought of this as testable. For most big data apps, you can easily scale up, particularly horizontal scaling up is very easy, so not sure if anyone thinks about testing scalability for most apps.
for testing and measuring the performance, you could use static data sources and input (in could be huge dump file or sqlite DB).
you can create test and include it in your intergration build so that it particular function call takes more than X seconds, throw error.
as you build up more of your system, you will see that number increase and break your test.
you could spend 20% of your time to get 80% of the functionality, the rest of the 80% goes to performance and scalability :)
Scalability - think about Service oriented architechture so that you can have load balancer in between and you can increase your state/processing by simply adding new hardware/service to your system

Advantage Data Access Layer

At work, I'm trying to get n-tier model implemented in a large already existing PHP application.
I have to convince my seniors since they don't see the point of an extra DA layer, becasue of performance. The code now queries the Db in the business logic, and calculates in the loop while retrieving the data from the resultset. Low performance cost.
I have tried convincing them by the obvious reasons: transparancy ('we can read SQL'), changing of database ('won't happen').
Their argument is that if it is getting done by a seperate layer, it will mean that a dataset has to be created, and being looped again in the business layer. Costing performance.
Also, creating this n-tier model will mean a lot of work which has no 'real' pay.
Is this a performance issue, and therefore a logical reason to say no to a seperate DA layer?
I think you touch an important point: Hand-optimized SQL without an additional abstraction layer can be faster. However, that comes at a price.
The question will probably be: Does the benefit of the additional speed outweigh the benefit of the database access layer e.g. encapsulating the SQL specific knowledge so that the engineers can focus on the business logic (domain layer).
In most cases you probably will find that the performance of a database abstraction layer will be good enough provided the implementation was done by an expert in this matter. If done properly the double buffer/looping can be avoided to a large degree.
I suspect that there are only a small proportion of application (my guess is no more than 20%) where performance is so critical that the abstraction layer is not an option.
But there is also a hybrid solution possible: Use the abstraction layer for the 80% of the modules where flexibility and convenience trumps speed and talk straight to the database in the 20% of the modules where speed is critical.
I probably would vote in favor of the abstraction layer and then optimize performance where needed (which may be achieved with means other than talking directly to the database).
Data access layer is outdated technology compare to the present technology because it is too much complicated & scientifically unproven technology, it checkes each and sql datatypes in a while loop & validates the datatypes, .net is facing serious app domain issues like executing codes from one class file to another class file takes more time, because .net assemblies are not tightly coupled, proof for my argument we can run Suse linux in 256 MB Ram in a very smooth manner, but not windows 7 or windows xp, moreover .net claims automatic memoery management, which is not true in fact, .net leaves lots of unused memory in the heap, this results in lots of performance loss in DAP architechure, moreover efforts in DAL is 95 % more compare to direct connection to the database using stored procedure, Don't use DAL, instead of it use WCF or xml webservices

Does three-tier architecture ever work?

We are building three-tier architectures for over a decade now. Dividing presentation-, logic- and data-tier is supposed to allow us to exchange each layer individually, should the need ever arise, be it through changed requirements or new technologies.
I have never seen it working in practice...
Mostly because (at least) one of the following reasons:
The three tiers concept was only visible in the source code (e.g. package naming in Java) which was then deployed as one, tied together package.
The code representing each layer was nicely bundled in its own deployable format but then thrown into the same process (e.g. an "enterprise container").
Each layer was run in its own process, sometimes even on different machines but through the static nature they were connected to each other, replacing one of them meant breaking all of them.
Thus what you usually end up with, in is a monolithic, tightly coupled system that does not deliver what it's architecture promised.
I therefore think "three-tier architecture" is a total misnomer. The true benefit it brings is that the code is logically sound. But that's at "write time", not at "run time". A better name would be something like "layered by responsibility". In any case, the "architecture" word is misleading.
What are your thoughts on this? How could working three-tier architecture be achieved? By that I mean one which holds its promises: Allowing to plug out a layer without affecting the other ones. The system should survive that and be in a well defined state afterwards.
Thanks!
The true purpose of layered architectures (both logical and physical tiers) isn't to make it easy to replace a layer (which is quite rare), but to make it easy to make changes within a layer without affecting the others (and as Ben notes, to facilitate scalability, consistency, and security) - which works all the time all around us.
One example of a 3-tier architecture is a typical database-driven web application:
End-user's web browser
Server-side web application logic
Database engine
In every system, there is the nice, elegant architecture dreamed up at the beginning, then the hairy mess when its finally in production, full of hundreds of bug fixes and special case handlers, and other typical nasty changes made to address specific issues not realized during the design.
I don't think the problems you've described are specific to three-teir architecture at all.
If you haven't seen it working, you may just have bad luck. I've worked on projects that serve several UIs (presentation) from one web service (logic). In addition, we swapped data providers via configuration (data) so we could use a low-cost database while developing and Oracle in higher environments.
Sure, there's always some duplication - maybe you add validation in the UI for responsiveness and then validate again in the logic layer - but overall, a clean separation is possible and nice to work with.
Once you accept that n-tier's major benefits--namely scalability, logical consistency, security--could not easily be achieved through other means, the question of whether or not any of the tiers can be replaced outright without breaking the the others becomes more like asking whether there's any icing on the cake.
Any operating system will have a similar kind of architecture, or else it won't work. The presentation layer is independent of the hardware layer, which is abstracted into drivers that implement a certain interface. The data is handled using logic that changes depending on the type of data being read (think NTFS vs. FAT32 vs. EXT3 vs. CD-ROM). Linux can run on just about any hardware you can throw at it and it will still look and behave the same because the abstractions between the layers insulate each other from changes within a single layer.
One of the biggest practical benefits of the 3-tier approach is that it makes it easy to split up work. You can easily have a DBA and a business anylist or two building a data layer, a traditional programmer building the server side app code, and a graphic designer/ web designer building the UI. The three teams still need to communicate, of course, but this allows for much smoother development in most cases. In this regard, I see the 3-tier approach working reliably everyday, and this enough for me, even if I cannot count on "interchangeable parts", so to speak.

Is Unit Testing Suitable for BPM Development?

I'm currently working on a large BPM project at work which uses the Global 360 BPM tool-set called Process 360. Just to give some background; this product works like a lot of other BPM solutions in that you design multiple "process maps" which define the flow of a particular business process you're trying to model, and each process map consists of multiple task nodes connected together which perform particular functions (calling web-services etc).
Currently we're experiencing some pretty serious issues during QA phases of our releases because there isn't any way provided by the tool-set to automate testing of the process map routes. So when a large and complex process is developed and handed over to our test team there are often a large number of issues which crop up. While obviously you'd expect some issues to come out of QA, I can't help the feeling that a lot of the bugs etc could have been spotted during development if we had some sort of automated testing framework which we could use to build up a set of unit tests which proved the various routes in the process map(s).
At the moment the only real development testing that occurs is more akin to functional testing performed by the developers which is documented as a set of manual steps per test-case. The problem with this approach is that it's very time consuming for the developers to run manually, and because of this, is also relatively prone to error. Also; because we're usually on a pretty tight schedule, the tests are often not executed often enough to spot issues early.
As I mentioned earlier; there isn't a way provided by the current tool-set to perform this sort of automated testing. Which actually got me thinking why? Being very new to the whole BPM scene my assumption was that this was just a feature lacking in the product, but I also wonder whether "unit testing" just isn't done in the BPM world traditionally? Perhaps it just isn't suited well to this sort of work?
I'd be interested to know if anyone else has ever encountered these sorts of issues, and also what - if anything - can be done to improve things.
I have seen something about that, though not Global 360 related : using bpelunit for testing processes
I develop a workflow tool and there is an increased demand for opening the test tools used for testing the engine to end-users.
I've done "unit" testing with K2.net 2003, another commercial BPM. I'd really call this integration testing, because it requires a test server and it's relatively slow. However, it is automated.
There's a good discussion of this in the book Professional K2 blackpearl (it applies to K2.net 2003 as well).
In order to apply it to your platform, the tool set has to have an API that permits starting process instances, obtaining work items, completing work items, etc. You write tests using any supported language (I used C#) and a testing framework (I used NUnit). If the API supports synchronous calls, this is easier to do. For each test:
Start the process under test
Progress the work item to a decision point
Set process instance data appropriately
Complete the work item
Assert that the work item is now at the expected activity
Delete or complete the process instance
Base test classes or helper methods can make this easier. You could even write a DSL for testing maps.
Essentially you want full "test coverage" of the process/map - test every decision point and insure that the correct branch is taken.
There are two aspects of BPM that are related yet not identical.
There's BPM that tool and technology vendors advocate for which is all about features.
There's also BPM that Enterprise Architects advocate for which is all about establishing Centers of Excellence.
The former is where a company buys some software.
The latter is where a company makes systemic and inherent changes to the behavior of their IT workers.
The former is supposed to be in the service of the latter but that isn't necessarily so. Acquiring the former is necessary but not sufficient for achieving the latter.
I don't know just how well that Global 360 supports what is known as Test Driven Development but JBoss jBPM does provide some tool support for easily writing JUnit tests.
However, the tool can't and won't force developers to write them or to embrace TDD principles.