At work, I'm trying to get n-tier model implemented in a large already existing PHP application.
I have to convince my seniors since they don't see the point of an extra DA layer, becasue of performance. The code now queries the Db in the business logic, and calculates in the loop while retrieving the data from the resultset. Low performance cost.
I have tried convincing them by the obvious reasons: transparancy ('we can read SQL'), changing of database ('won't happen').
Their argument is that if it is getting done by a seperate layer, it will mean that a dataset has to be created, and being looped again in the business layer. Costing performance.
Also, creating this n-tier model will mean a lot of work which has no 'real' pay.
Is this a performance issue, and therefore a logical reason to say no to a seperate DA layer?
I think you touch an important point: Hand-optimized SQL without an additional abstraction layer can be faster. However, that comes at a price.
The question will probably be: Does the benefit of the additional speed outweigh the benefit of the database access layer e.g. encapsulating the SQL specific knowledge so that the engineers can focus on the business logic (domain layer).
In most cases you probably will find that the performance of a database abstraction layer will be good enough provided the implementation was done by an expert in this matter. If done properly the double buffer/looping can be avoided to a large degree.
I suspect that there are only a small proportion of application (my guess is no more than 20%) where performance is so critical that the abstraction layer is not an option.
But there is also a hybrid solution possible: Use the abstraction layer for the 80% of the modules where flexibility and convenience trumps speed and talk straight to the database in the 20% of the modules where speed is critical.
I probably would vote in favor of the abstraction layer and then optimize performance where needed (which may be achieved with means other than talking directly to the database).
Data access layer is outdated technology compare to the present technology because it is too much complicated & scientifically unproven technology, it checkes each and sql datatypes in a while loop & validates the datatypes, .net is facing serious app domain issues like executing codes from one class file to another class file takes more time, because .net assemblies are not tightly coupled, proof for my argument we can run Suse linux in 256 MB Ram in a very smooth manner, but not windows 7 or windows xp, moreover .net claims automatic memoery management, which is not true in fact, .net leaves lots of unused memory in the heap, this results in lots of performance loss in DAP architechure, moreover efforts in DAL is 95 % more compare to direct connection to the database using stored procedure, Don't use DAL, instead of it use WCF or xml webservices
Related
By 'functionalities structuring', I mean how we organize and coordinate different API endpoints to offer desired functionalities to clients. The context here is web APIs for consumption by mobile phones with GPS tracking, and I assume either cellular or WiFi connectivity is required for most functionalities.
I personally prefer a more 'modular' approach where each endpoint does mostly one thing and a collection of them fulfill all the requirements. Of course, you may need to combine some subset or sequence of these endpoints to achieve certain functionalities. Overall, I try to minimize the overlapping between endpoints in terms of both computation and functionalities.
On the other hand, I know some other people prefer client-side convenience (or simplicity) over modularity in the following ways:
If the client needs to achieve a functionality, then there should exist a single API endpoint which does exactly that, such that the client needs only a single request to fulfill the functionality with minimal caching/logic in between requests.
For GET endpoints, if there are multiple levels/kinds of data involved for some functionalities, they prefer as much data as possible (often all necessary data) returned by a single endpoint. Ironically, they may also want a dedicated endpoint for retrieving only the "lowest level" data using a corresponding "highest level" ID. For example, If A corresponds to a collection of Bs, and each B corresponds to a collection of Cs, then they will prefer a direct endpoint that retrieves all the relevant Cs given an A.
In some extreme cases, they will ask for a single endpoint with ambiguous naming (e.g. /api/data) that returns related data from different underlying DB tables (in other words, different resources) based on different combinations of query string parameters.
I understand that people preferring such conveniences above aim to: 1. reduce the number of API requests necessary to fulfill functionalities; 2. minimize data caching and data logic on the client side to reduce client complexity, which arguably leads to a 'simple' client with simplified interaction with the server.
However, I also wonder if the cost of doing so is unjustifiable in other aspects in the long run, especially in terms of the performance and the maintenance of the server-side API. Hence my questions:
What are the tried-and-true guidelines for structuring API functionalities?
How do we determine an optimal number of requests necessary for fulfilling a functionality in a mobile app? Of course, if all other things equal, a single request is the best, but achieving such a single-request implementation usually carries penalty in other aspects.
Given the contention between the number of client requests and the performance and maintainability of server-side API, what are the approaches for striking a balance in order to deliver a sensible design?
What you are asking about breaks into at least three main areas of API design:
Ontology Design (organization)
Request/Response Design (complexity/performance)
Maintenance Considerations
Based on my experience (which is largely from working with very large organizations both on the API producing and consuming side and talking with hundreds of developers on the topic), let's look at each area, addressing the specific points you bring up...
Ontology Design
There are a couple of things to take in to consideration in your design that are perhaps implied when you say:
Overall, I try to minimize the overlapping between endpoints in terms of both computation and functionalities.
This approach makes the APIs easily discoverable. When you are in a situation where you are publishing APIs for consumption by other developers who you may or may not know (and may or may not have enough resources to truly support), this kind of modularity - making them easy to find and learn about - creates a different kind of "convenience" leading to easier adoption and reuse of your APIs.
I know some other people much prefer convenience over modularity: 1. if the client needs a functionality, then there should exist a single endpoint in the API which does exactly that...
The best public example that comes to mind for this approach is perhaps the Google Analytics Core Reporting API. They implement a series of querystring parameters to build a call that returns the data requested, ex:
https://www.googleapis.com/analytics/v3/data/ga
?ids=ga:12134
&dimensions=ga:browser
&metrics=ga:pageviews
&filters=ga:browser%3D~%5EFirefox
&start-date=2007-01-01
&end-date=2007-12-31
In that example we are querying Google Analytics Account 12134 for pageviews by browser where broswer is Firefox for the given date range.
Given the number of metrics, dimensions, filters, and segments their API exposes, they have a tool called the Dimensions & Metrics Explorer to help developers understand how to use the APIs.
One approach makes the APIs discoverable and more understandable from the outset. The other requires more supporting work to explain the intricacies of consuming the API. One thing that isn't immediately obvious with the Google API above is that certain segments and metrics are incompatible, so if you are making calls passing one key/value pair, you may not longer be able to pass certain other pairs.
Request/Response Design
The context here is APIs for mobile applications.
That is still very broad, and better defining (if possible) how you intend for your "mobile applications" to be used can help you design your APIs.
Do you intend for them to be used totally offline? If so, heavy/complete data caching may be desirable.
Do you intend for them to be used in low bandwidth and/or high latency/error-rate connectivity scenarios? If so, heavy/complete data caching may be desirable, but so might small/discrete data requests.
for GET endpoints, they often prefer as much data as possible returned by a single endpoint, especially when there are multiple levels/layers of data involved
This is safe if you know you'll only ever be in good mobile connectivity scenarios, or you can cache the data heavily when you are (and thus access it offline or when things are spotty).
I understand that people preferring convenience aim to reduce the number of API calls necessary to achieve functionalities...
One way to find a happy middle ground is to implement paging in your data-intensive calls. For example, a querystring can be passed in a GET specifying 'pagesize'. Thus 10,000 records could be returned 100 at a time over 100 successive calls, or 1,000 at a time over 10 calls.
With this approach, you can design and publish your API without necessarily knowing what your consuming developer will need. Even though the paging example above uses the Google API referenced earlier, it can still be used in a more semantically designed API. For example, say you have GET /customer/phonecalls you could still design it to accept a pagesize value and make successive calls to get all the phonecalls associated with customer.
Maintenance
I also wonder if the cost of doing so [reduce the number of API calls necessary to achieve functionalities and to minimize data caching] is not justifiable in the long run, especially for the performance and the maintenance of an API.
The key guiding principle here is separation of concerns if your collection of APIs is going to grow to any significant level of complexity and scale.
What happens when you have everything bundled together into one big service and a small part of it changes? You are now creating not only a maintenance headache on your side, but also for your API consumer.
Did that "breaking change" really affect the part of the API they were using? It will take time and energy for them to figure that out. Designing API functionality into discrete, semantic services will let you create a roadmap and version them in a more understandable way.
For further reading, I'd suggest checking out Martin Fowler's writings on Microservices Architecture:
In short, the microservice architectural style is an approach to
developing a single application as a suite of small services, each
running in its own process and communicating with lightweight
mechanisms
Although there is a lot of debate about how to design and build for "microservices" in practice, reading up on that should help further shape your thinking on the API design decisions you're facing and prepare you to engage in "current" discussions around the topic.
We have a C++ application that utilizes some basic APIs to send raw queries to a MS SQL Server. Scattered through the various translation units in our program, we have simple 1-2 line queries as C++ strings, and every now and then you'll see more complex queries that can be over 20 lines.
I can't help but think that the larger queries, specifically the 20+ line ones, should not be embedded in C++ code as constant strings. I want to propose pulling these out into separate text files that are loaded on-demand by the C++ application, however I'm not sure if this is the best approach.
What design choices are typical for situations like this? I definitely feel there needs to be improvement, I just don't know if moving the SQL queries out into data files (text files) is the best idea.
You could make a DAL (Data Access Layer).
It would be the API that the rest of the program talks to. Then you can mess around and try anything and everything (Stored procedures, caching, etc.) without disturbing the main program.
Move them into their own files, or even into their own stored procedures. Queries embedded in the application cannot be changed without a recompile, and depending on your release procedures, that could severely impair your ability to respond to emergencies or deploy hot fixes. You could alter your app to cache the file contents, if you go down that road, and even periodically check the files for updates.
the best "design choice" - for many different reasons - is to use MSSQL stored procedures whenever/wherever possible.
I've seen code that segregates SQL queries into a common module, but I don't think there's much benefit to a common "queries module" (or a standalone text file) over having the SQL queries spelled out as string literals in the module that's calling them.
Stored procedures, on the other hand, increase modularity, enhance security, and can vastly improve performance.
IMHO...
I would leave the SQL embedded in the C++ functions that use it: it will be easier to read and understand what the code does.
If you have SQL queries scattered around your code I'd say that there is some problem with the overall structure of the classes you are using: you should have some (or even just one) 'low level' classes that handle the interaction with the database, and the rest of the code uses these classes.
I personally don't like using stored procedure: if you have to support a different database server the porting will be a pain, I never saw that much of a performance improvement and to understand what the code does you have to jump back and forth between the stored procedures and the C++.
It really depends, here are some notes:
1) If all your sql code resides in the application, then your application is pretty much self contained in terms of logic. This is good as you have done in the current application. In terms of speed, this can be a little slower as SQL will need to be parsed when when you run these queries(also depends if you used Prepared statements,etc which can speed it up).
2) The second approach is to put all SQL logic as stored procedures on the server. This is a very much preferred approach for even small SQL queries whether one line or not. You just build a DAL layer. In terms of performance this is very good, however the logic lives in two different systems, your C++ app and the SQL server. You will quite likely need to build a small utility application that can translate the stored procedures input and output to template code (be it C++ or any other) to make your life easier.
3) A mixed approach with the above two. I would not recommend this route.
You need to think about how these queries are likely to change over time, and compare it to how the related C++ code is likely to change. If the queries are relatively independent of the code, and have a higher likelihood of change, then I would either load them at runtime from separate files, or use stored procedures instead. That approach allows for changing the queries without recompiling the C++ code. On the other hand, if the queries are highly coupled to the C++ code, making a change in one likely to accompany a change in the other, I would keep the queries in the code. This approach makes a change more localized and less error prone.
Has anyone done a port from a monolithic server application to a service and what are the hidden ‘gotchas’ I need to be aware of to accurately estimate the cost of this?
We have an existing single threaded monolithic server application written in Java. It performs fine as it sits, but we want to start extending it. Extending it would mean many more people would use it and the server would not be able to handle the extra load. There was a significant development investment in this code base, and the code base is large. The cost of multithreading the server would be crazy.
I had a harebrained idea about breaking it up into logical service components, removing them from the application and hosting them on Axis2 or Tomcat, and pushing them into a SOA cloud.
I have written many services for Axis2, worked plenty with SOA clouds, and written multiple monolithic servers and it seems straight forward. Eliminate as much shared state as possible – then push that to a DB of some kind, pull out logical services from the monolithic app, repeat until done.
But I have a bad feeling that the devil is in the details and I am certain I am not the first person to have an idea like this.
My experience in these types of system/architecture migrations, the time-sinks and killers tend to be:
Identifying all use-cases and functional requirements. Roughly 50% will not be documented. Out of the 50% that are, 50% will be incorrect. Out of the 50% that are both document and correct, 50% will not be valid anymore.
Ensuring backwards compatibility. Just because you are moving to a new architecture generally doesn't mean that all clients will move with you at the same time.
Migrating existing data into new structure/model/architecture. This always takes a lot longer than you think. Take the worst case scenario you can imagine in terms of time/cost, double it and you'll still be short by about half.
Access control model.
Documenting your services in a clear and useful way. Your shiny new SOA architecture won't be worth squat if no one can use it.
Performance testing. It's mind boggling how often this is skipped or done at the very end of the project. Establish performance scenarios, testing infrastructure and baseline values first thing and then continually test and measure against them. This should be to an architect what unit testing is to a developer.
These are the things that make projects fail, go over budget and over time if you don't address them early in the project.
Imagine you are designing a system and you want to start writing tests that determine functionality - but also performance and scalability. Are there any techniques you can share for handling large sets of data in different environments?
I would strongly recommend prioritizing functionality tests (using TDD as your development workflow) before working in performance and scalability tests. TDD will ensure your code is well designed and loosely coupled which will make it much, much, easier down the road to create automated performance and scalability. When your code is loosely coupled, you get control over your dependencies. When you have control over your dependencies, you can create any configuration you want for any high level test you want to write.
Do some functionality test. Consider few Risk management techniques refer this post How to handle risk Management in Big data This will help you.
Separate the different types of test.
Functional testing should come first, starting unit tests with small
amounts of mock data.
Next, integration tests, with small amounts of
data in a data store, though obviously not the same instance as the
store with the large data sets.
You maybe able to reduce your development effort by doing performance
and scalability tests together.
One important tip: Your test data set should be as realistic as possible. Use production data, anonymizing as necessary. Because big data performance depends on statistical distribtions in the data, you don't want to use synthetic data. For example, if you use fake user data that basically has the same user info a million times, you will get very different scalability results as opposed to real-life messy user data with a wide distribution of values.
For more specific advice, I'd need to know the technology you're using. In Hadoop, look at MRUnit. For RDBs, DBUnit. Apache Bigtop can provide inspiration, though it is aimed at core projects on Hadoop rather than specific application-level projects.
There are two situations that I encountered:
large HDFS datasets that serve as Data Warehouse or data sink for other applications
Apps with HBASE or other distributed databases
Tips for unit testing in both cases:-
a. Test the different functional components of the app first, there is no special rule for Big Data apps; just like any other app, unit testing should ascertain whether different components on an app are working as expected or not; then you could integrate functions/services/components, etc to do the SIT, if applicable
b. Specifically if HBASE or any other distributed data base is there, please test what is required of the DB. For e.g. distributed data bases often do not support ACID properties like traditional databases and instead are limited by something called CAP theorem (Consistency, Availability, Partition tolerance); usually 2 among 3 are guaranteed. For most RDBMS it is CA, usually for HBASE it is CP and Cassandra AP. As a designer or test planner, you should know, depending on your applications features, which is the CAP constraint for your distributed data base, and accordingly create test plan to check the real situation
Regarding performance - Again a lot depends on the infrastructure and app design. Also sometimes certain s/w implementations are more taxing than others. You might check the amount of partitioning for example, its all case based
Regarding Scalability - the very advantage of a Big data implementation is it is easily scalable compared to a traditional architecture. I have never thought of this as testable. For most big data apps, you can easily scale up, particularly horizontal scaling up is very easy, so not sure if anyone thinks about testing scalability for most apps.
for testing and measuring the performance, you could use static data sources and input (in could be huge dump file or sqlite DB).
you can create test and include it in your intergration build so that it particular function call takes more than X seconds, throw error.
as you build up more of your system, you will see that number increase and break your test.
you could spend 20% of your time to get 80% of the functionality, the rest of the 80% goes to performance and scalability :)
Scalability - think about Service oriented architechture so that you can have load balancer in between and you can increase your state/processing by simply adding new hardware/service to your system
We are building three-tier architectures for over a decade now. Dividing presentation-, logic- and data-tier is supposed to allow us to exchange each layer individually, should the need ever arise, be it through changed requirements or new technologies.
I have never seen it working in practice...
Mostly because (at least) one of the following reasons:
The three tiers concept was only visible in the source code (e.g. package naming in Java) which was then deployed as one, tied together package.
The code representing each layer was nicely bundled in its own deployable format but then thrown into the same process (e.g. an "enterprise container").
Each layer was run in its own process, sometimes even on different machines but through the static nature they were connected to each other, replacing one of them meant breaking all of them.
Thus what you usually end up with, in is a monolithic, tightly coupled system that does not deliver what it's architecture promised.
I therefore think "three-tier architecture" is a total misnomer. The true benefit it brings is that the code is logically sound. But that's at "write time", not at "run time". A better name would be something like "layered by responsibility". In any case, the "architecture" word is misleading.
What are your thoughts on this? How could working three-tier architecture be achieved? By that I mean one which holds its promises: Allowing to plug out a layer without affecting the other ones. The system should survive that and be in a well defined state afterwards.
Thanks!
The true purpose of layered architectures (both logical and physical tiers) isn't to make it easy to replace a layer (which is quite rare), but to make it easy to make changes within a layer without affecting the others (and as Ben notes, to facilitate scalability, consistency, and security) - which works all the time all around us.
One example of a 3-tier architecture is a typical database-driven web application:
End-user's web browser
Server-side web application logic
Database engine
In every system, there is the nice, elegant architecture dreamed up at the beginning, then the hairy mess when its finally in production, full of hundreds of bug fixes and special case handlers, and other typical nasty changes made to address specific issues not realized during the design.
I don't think the problems you've described are specific to three-teir architecture at all.
If you haven't seen it working, you may just have bad luck. I've worked on projects that serve several UIs (presentation) from one web service (logic). In addition, we swapped data providers via configuration (data) so we could use a low-cost database while developing and Oracle in higher environments.
Sure, there's always some duplication - maybe you add validation in the UI for responsiveness and then validate again in the logic layer - but overall, a clean separation is possible and nice to work with.
Once you accept that n-tier's major benefits--namely scalability, logical consistency, security--could not easily be achieved through other means, the question of whether or not any of the tiers can be replaced outright without breaking the the others becomes more like asking whether there's any icing on the cake.
Any operating system will have a similar kind of architecture, or else it won't work. The presentation layer is independent of the hardware layer, which is abstracted into drivers that implement a certain interface. The data is handled using logic that changes depending on the type of data being read (think NTFS vs. FAT32 vs. EXT3 vs. CD-ROM). Linux can run on just about any hardware you can throw at it and it will still look and behave the same because the abstractions between the layers insulate each other from changes within a single layer.
One of the biggest practical benefits of the 3-tier approach is that it makes it easy to split up work. You can easily have a DBA and a business anylist or two building a data layer, a traditional programmer building the server side app code, and a graphic designer/ web designer building the UI. The three teams still need to communicate, of course, but this allows for much smoother development in most cases. In this regard, I see the 3-tier approach working reliably everyday, and this enough for me, even if I cannot count on "interchangeable parts", so to speak.