Sitecore performance enhancements - sitecore

We need our Sitecore web application to process 60-80 web requests per second. We are using Sitecore 7.0. We have tried a 1 Webserver + 1 Database server deployment, but it only processes 20-25 requests per second. Web server queues up all the other requests in the Memory. As we increase the load, memory fills up.(We did all Sitecore performance enhancements recommended). We need 4X performance to reach the goal :).
Will it be possible to achieve this goal by upgrading the existing server, or do we have to add more web servers in production environment.
Note: We are using Lucene indexing as well.

Here are some things you can consider without changing overall architecture of your deployment
CDN to offload media and static asset requests
This leaves your content delivery server available to handle important content queries and display logic.
Example www.cloudflare.com
Configure and use Sitecore's built-in caching
This is from the guide:
Investigation and configuration of the Sitecore Caches is broken down
into multiple tasks. This way each task is more focused and
simplified. The focus is on configuration and tuning of the Sitecore
Database Caches (prefetch, data, and item caches.)
For configuration
of the output rendering caching properties, the customer should be
made aware of both the Sitecore Cache Configuration Reference and the
Sitecore Presentation Component Reference as to how properly enable
and the properties to expire these caches.
Check out the Sitecore Tuning Guide
Find Slow Queries or Controls
It sounds like your application follows Sitecore best practices, but I leave this note in for anyone that might find this answer. Use Sitecore's built-in Debug mode to identify the slowest running controls and sublayouts. Additionally, if you have Analytics set up there is a "Slow Pages" report that might give you some information on where your application is slowing down.
Those things being said, if you're prepared to provision additional servers and set up a load-balanced environment then read on.
Separate Content Delivery and Content Management
To me the first logical step before load-balancing content delivery servers is to separate the content management from the equation. This is pretty easy and the Scaling Guide walks you through getting the HistoryEngine set up to keep those Lucene indexes up to date.
Set up Load Balancer with 2 or more Content Delivery servers
Once you've done the first step this can be as easy as cloning your content delivery server and adding it to your load balancer "pool". There are a couple of things to consider here like: Does your web application allow users to log in? So you'll need to worry about sticky sessions or machine keys. Does your web application use file media instead of blob media? I haven't had to deal with this, but I understand that's another consideration.
Scale your SQL solution
I've seen applications with up to four load balanced content delivery servers and the SQL Server did not have a problem - I think this will be unique to each case depending on a lot of factors: horsepower and tuning of SQL Server, content model of your application, complexity of your queries, caching configuration on content delivery servers, etc. Again, the Scaling Guide covers SQL Mirroring and Failover, so that is going to be your first stop on getting that going.
Finally, I would say contact Sitecore. These guys have probably seen more of what's gone right and what's gone wrong with installations and could get you on the right path. Good luck!

This answer written from a Sitecore developer perspective:
Bottom line: You need to figure out exactly where your performance bottleneck is. That is going to take some digging, but will be very worthwhile. You should definitely be able to serve 60-80 requests/s without any trouble... but of course that makes a lot of assumptions about the nature of your site and the requests.
For my site, I found Sitecore's caching implementation to be sub-par... I created some very simple and aggressive application-specific caches in my app and this made all the difference in the world. For instance, we have 900+ "Partner" items where our sites' advertisements live... and simply putting all these objects in an array in the Application object sped up page requests significantly. Finding an object in a Hashtable indexed by its Item.Name or ID is going to be a lot faster than Sitecore.Context.Database.GetItem("/itempath") or a SelectItems() call (at least, that's my experience). If your architecture and data set will allow this strategy, we've had good experience with it.
Another thing to watch out for is XSLT renderings. Personally, I avoid them completely in favor of ASP.NET UserControls. The XSLT rendering is just slow. As much as 10x slower than a native UserControl rendering the same HTML. So if you have a few of these... replace with some custom code and you'll see a world of difference.

Related

Apache SuperSet is very slow

Any recommendation on how to make superset faster?
Cache seems to load full data from the cache, I thought it load only old data from the cache, and real-time data from the database, isn't it like this?
What about some parallel processing?
This answer is valid as of Superset 0.37.0.
At the moment, dashboard performance is affected by a few different factors. I'll enumerate them below along with methods to improve performance:
Database concurrency limits can have an impact on dashboard performance. Dashboards load their information in parallel via concurrent web requests. Make sure that the database user provided allows enough concurrency that queries aren't being queued at the database layer.
Cache performance your caching layer should be able to return multiple results, if not in parallel, extremely quickly. We've had success leveraging S3 for our cache.
Cache hit percentage Superset will hit the cache only for queries that exactly match one that has been run recently. Otherwise the full query will fall through to the underlying analytical DB (Druid in this case). You can reduce the query load on Druid by using a less granular resolution on your dashboard - if it's possible to have it update less frequently, say a couple of times a day rather than in real-time, this can hit cache for all requests other than the first request in the new period under consideration.
Python Web Process Concurrency Limits make sure that your web application server can handle enough parallel requests. The browser will request multiple charts' data at the same time, and the system will need to be able to handle these requests in parallel.
Chart Query Performance As data is frequently requested, especially for real-time data from a database like Druid, optimizing the queries run by the charts can be very useful. I'd take a look at any virtual datasources that are being leveraged to see if they can be materialized or made more efficient.
Web browser concurrent request limits By default most web browsers limit the number of concurrent requests that can be made to the same FQDN. If you have more than 6 charts on the same dashboard, it can be helpful to balance requests across multiple FQDNs running Superset to get around this browser limitation. There's more information on the approach to that in the issue history on Github, but Superset does support this type of configuration.
The community is very interested in improving performance over time, and as such there have been recommendations to move all analytical queries to Celery as well as making other architectural changes to improve performance. I hope this description helps and that something in here will help you track down the issue!

What configurations need to be set for a LAMP server for heavy traffic?

I was contracted to make a groupon-clone website for my client. It was done in PHP with MYSQL and I plan to host it on an Amazon EC2 server. My client warned me that he will be email blasting to about 10k customers so my site needs to be able to handle that surge of clicks from those emails. I have two questions:
1) Which Amazon server instance should I choose? Right now I am on a Small instance, I wonder if I should upgrade it to a Large instance for the week of the email blast?
2) What are the configurations that need to be set for a LAMP server. For example, does Amazon server, Apache, PHP, or MySQL have a maximum-connections limit that I should adjust?
Thanks
Technically, putting the static pages, the PHP and the DB on the same instance isn't the best route to take if you want a highly scalable system. That said, if the budget is low and high availablity isn't a problem then you may get away with it in practise.
One option, as you say, is to re-launch the server on a larger instance size for the period you expect heavy traffic. Often this works well enough. You problem is that you don't know the exact model of the traffic that will come. You will get a certain percentage who are at their computers when it arrives and they go straight to the site. The rest will trickle in over time. Having your client send the email whilst the majority of the users are in bed, would help you somewhat, if that's possible, by avoiding the surge.
If we take the case of, say, 2,000 users hitting your site in 10 minutes, I doubt a site that hasn't been optimised would cope, there's very likely to be a silly bottleneck in there. The DB is often the problem, a good sized in-memory cache often helps.
This all said, there are a number of architectural design and features provided by the likes of Amazon and GAE, that enable you, with a correctly designed back-end, to have to worry very little about scalability, it is handled for you on the most part.
If you split the database away from the web server, you would be able to put the web server instances behind an elastic load balancer and have that scale instances by demand. There also exist standard patterns for scaling databases, though there isn't any particular feature to help you with that, apart from database instances.
You might want to try Amazon mechanical turk, which basically lots of people who'll perform often trivial tasks (like navigate to a web page click on this, etc) for a usually very small fee. It's not a bad way to simulate real traffic.
That said, you'd probably have to repeat this several times, so you're better off with a load testing tool. And remember, you can't load testing a time-slicing instance with another time-slicing instance...

How to evaluate the performance of web servers?

I'm planing to deploy a django powered site. But I feel confused about the choice of web servers, which includes apache, lighttpd, nginx and others.
I've read some articles about the performance of each of these choice. But it seems no one agrees. So I'm wondering why not test the performance by myself?
I can't find information about the best approach to performance testing web servers. So my questions are:
Is there any easy approach to test the performance without the production site?
Or can I have a method to simulate the heavy traffic to have a fair test?
How can I keep my test fair and close to production situation?
After the test, I want to figure out:
Why some ones say nginx has a better performance when serving static files.
The cpu and memory needs of each web server.
My best choice.
Tools like ab are commonly used towards testing how much load you can take from a battering of requests at once, alongside cacti/munin/your system monitoring tool or choice you can generate data on system load & requests/sec. The problem with this is many people benchmarking don't realise that they need to request a lot of different requests, as different parts of your code executes it will take varying amounts of time. Profiling and benchmarking code and not requests is also important, to which plenty of folk have already done so for django, benchrun is also not a bad tool either.
The other issue, is how many HTTP requests each page view takes. The less amount of requests, and the quicker they can be processed is the key to having websites that can sustain a high amount of traffic, as the quicker you can finish and close connections, the quicker you allocate resources for new ones.
In terms of general speed of web servers, it goes without saying that a proxy server (running reverse at your end) will always perform faster than a webserver with static content. As for Apache vs nginx in regards to your django app, it seems that mod_python is indeed faster than nginx/lighty + FastCGI but that's no surprise because CGI, regardless of any speed ups is still slow. Executing and caching code at the webserver and letting it manage it is always faster (mod_perl vs use CGI, mod_php vs CGI, etc) if you do it right.
Apache JMeter is an excellent tool for stress-testing web applications. It can be used with any web server, not just Apache.
You need to set up the web server + website of your choice on a machine somewhere, preferably a physical machine with similar hardware specs to the one you will eventually be deploying to.
You then need to use a load testing framework, for example The Grinder (free), to simulate many users using your site at the same time.
The load testing framework should be on separate machine(s) and you should monitor the network and CPU usage of those machines as well to make sure that the limiting factor of your testing is in fact the web server and not your load injectors.
Other than that its just about altering the content and monitoring response times, throughput, memory and CPU use etc... to see how they change depending on what web server you use and what sort of content you are hosting.

Offline web application

I’m thinking about building an offline-enabled web application.
The architecture I’m considering is as follows:
Web server (remote) <--> Web server/cache (local) <--> Browser/Prism
The advantages I envision for this model are:
Deployment is web-based, with all the advantages of this approach
Offline-enabled
UI (html/js) synchronization is a non-issue
Data synchronization can be mostly automated
as long as I stay within a RESTful paradigm
I can break this as required but manual synchronization would largely remain surgical
The local web server is started as a service; I can run arbitrary code, including behind-the-scene data synchronization
I have complete control of the data (location, no size limit, no possibility of user deleting unknowingly)
Prism with an extension could allow to keep the javascript closed source
Any thoughts on this architecture? Why should I / shouldn’t I use it? I'm particularly looking for success/horror stories.
The long version
Notes:
Users are not very computer-literate.
For instance, even superficially
explaining how Gears works is totally
out of the question.
I WILL be held liable if data is loss, even if it’s really the users fault (short of him deleting random directories on his machine)
I can require users to install something on their machine. It doesn’t have to be 100% web-based and/or run in a sandbox
The common solutions to this problem don’t feel adequate somehow. Here is a short analysis of each.
Gears/HTML5:
no control over data, can be deleted
by users without any warning
no
control over location of data (not
uniform across browsers and
platforms)
users need to open application in browser for synchronization to happen; no automatic, behind-the-scene synchronization
different browsers are treated differently, no uniform view of data on a single machine
limited disk space available
synchronization is completely manual, sql-based storage makes this a pain (would be less complicated if sql tables were completely replicated but it’s not so in my case). This is a very complex problem.
my code would be almost completely open sourced (html/js)
Adobe AIR:
some of the above
no server-side includes (!)
can run in the background, but not windowless
manual synchronization
web caching seems complicated
feels like a kludge somehow, I’ve had trouble installing on some machines
My requirements are:
Web-based (must). For a number of
reasons, sharing data between users
for instance.
Offline (must). The application must be fully usable offline (w/ some rare exceptions).
Quick development (must). I’m a single developer going against players with far more business resources.
Closed source (nice to have). Yes, I understand the open source model. However, at this point I don’t want competitors to copy me too easily. Again, they have more resources so they could take my hard work and make it better in less time than I could myself. Obviously, they can still copy me developing their own code -- that is fine.
Horror stories from a CRM product:
If your application is heavily used, storing a complete copy of its data on a user's machine is unfeasible.
If your application features data that can be updated by many users, replication is not simple. If three users with local changes synch, who wins?
In reality, this isn't really what users want. They want real-time access to the most current data from anywhere. We had better luck offering a mobile interface to a single source of truth.
The part about running the local Web server as a service appears unwise. Besides the fact that you are tied to certain operating environments that are available in the client, you are also imposing an additional burden of managing the server, on the end user. Additionally, the local Web server itself cannot be deployed in a Web-based model.
All in all, I am not too thrilled by the prospect of a real "local Web server". There is a certain bias to it, no doubt since I have proposed embedded Web servers that run inside a Web browser as part of my proposal for seamless off-line Web storage. See BITSY 0.5.0 (http://www.oracle.com/technology/tech/feeds/spec/bitsy.html)
I wonder how essential your requirement to prevent data loss at any cost is. What happens when you are offline and the disk crashes? Or there is a loss of device? In general, you want the local cache to be the least farther ahead of the server, but be prepared to tolerate loss of data to the extent that the server is behind the client. This may involve some amount of contractual negotiation or training. In practice this may not be a deal-breaker.
The only way to do this reliably is to offer some sort of "check out and lock" at the record level. When a user is going remote they must check out the records they want to work with. This check out copied the data to a local DB and prevents the record in the central DB from being modified while the record is checked out.
When the roaming user reconnects and check their locked records back in the data is updated on the central DB and unlocked.

Difference between frontend, backend, and middleware in web development

I was wondering if anyone can compare/contrast the differences between frontend, backend, and middleware ("middle-end"?) succinctly.
Are there cases where they overlap?
Are there cases where they MUST overlap, and frontend/backend cannot be separated?
In terms of bottlenecks, which end is associated with which type of bottlenecks?
Here is one breakdown:
Front-end tier -> User Interface layer usually consisting of a mix of HTML, Javascript, CSS, Flash, and various server-side code like ASP.Net, classic ASP, PHP, etc. Think of this as being closest to the user in terms of code.
Middleware, middle-tier -> One tier back, generally referred to as the "plumbing" part of a system. Java and C# are common languages for writing this part that could be viewed as the glue between the UI and the data and can be webservices or WCF components or other SOA components possibly.
Back-end tier -> Databases and other data stores are generally at this level. Oracle, MS-SQL, MySQL, SAP, and various off-the-shelf pieces of software come to mind for this piece of software that is the final processing of the data.
Overlap can exist between any of these as you could have everything poured into one layer like an ASP.Net website that uses the built-in AJAX functionality that generates Javascript while the code behind may contain database commands making the code behind contain both middle and back-end tiers. Alternatively, one could use VBScript to act as all the layers using ADO objects and merging all three tiers into one.
Similarly, taking middleware and either front or back-end can be combined in some cases.
Bottlenecks generally have a few different levels to them:
1) Database or back-end processing -> This can vary from payroll or sales or other tasks where the throughput to the database is bogging things down.
2) Middleware bottlenecks -> This would be where some web service may be hitting capacity but the front and back ends have bandwidth to handle more traffic. Alternatively, there may be some server that is part of a system that isn't quite the UI part or the raw data that can be a bottleneck using something like Biztalk or MSMQ.
3) Front-end bottlenecks -> This could client or server-side issues. For example, if you took a low-end PC and had it load a web page that consisted of a lot of data being downloaded, the client could be where the bottleneck is. Similarly, the server could be queuing up requests if it is getting hammered with requests like what Amazon.com or other high-traffic websites may get at times.
Some of this is subject to interpretation, so it isn't perfect by any means and YMMV.
EDIT: Something to consider is that some systems can have multiple front-ends or back-ends. For example, a content management system will likely have a way for site visitors to view the content that is a front-end but what about how content editors are able to change the data on the site? The ability to pull up this data could be seen as front-end since it is a UI component or it could be seen as a back-end since it is used by internal users rather than the general public viewing the site. Thus, there is something to be said for context here.
Generally speaking, people refer to an application's presentation layer as its front end, its persistence layer (database, usually) as the back end, and anything between as middle tier. This set of ideas is often referred to as 3-tier architecture. They let you separate your application into more easily comprehensible (and testable!) chunks; you can also reuse lower-tier code more easily in higher tiers.
Which code is part of which tier is somewhat subjective; graphic designers tend to think of everything that isn't presentation as the back end, database people think of everything in front of the database as the front end, and so on.
Not all applications need to be separated out this way, though. It's certainly more work to have 3 separate sub-projects than it is to just open index.php and get cracking; depending on (1) how long you expect to have to maintain the app (2) how complex you expect the app to get, you may want to forgo the complexity.
There are in fact 3 questions in your question :
Define frontend, middle and back end
How and when do they overlap ?
Their associated usual bottlenecks.
What JB King has described is correct, but it is a particular, simple version, where in fact he mapped front, middle and bacn to an MVC layer.
He mapped M to the back, V to the front, and C to the middle.
For many people, it is just fine, since they come from the ugly world where even MVC was not applied, and you could have direct DB calls in a view.
However in real, complex web applications, you indeed have two or three different layers, called front, middle and back. Each of them may have an associated database and a controller.
The front-end will be visible by the end-user. It should not be confused with the front-office, which is the UI for parameters and administration of the front. The front-end will usually be some kind of CMS or e-commerce Platform (Magento, etc.)
The middle-end is not compulsory and is where the business logics is. It will be based on a PIM, a MDM tool, or some kind of custom database where you enrich your produts or your articles (for CMS). It'll also be the place where you code business functions that need to be shared between differents frontends (for instance between the PC frontend and the API-based mobile application). Sometimes, an ESB or tool like ActiveMQ will be your middle-end
The back-end will be a 3rd layer, surrouding your source database or your ERP. It may be jsut the API wrting to and reading from your ERP. It may be your supplier DB, if you are doing e-commerce. In fact, it really depends on web projects, but it is always a central repository. It'll be accessed either through a DB call, through an API, or an Hibernate layer, or a full-featured back-end application
This description means that answering the other 2 questions is not possible in this thread, as bottlenecks really depend on what your 3 ends contain : what JB King wrote remains true for simple MVC architectures
at the time the question was asked (5 years ago), maybe the MVC pattern was not yet so widely adopted. Now, there is absolutely no reason why the MVC pattern would not be followed and a view would be tied to DB calls.
If you read the question "Are there cases where they MUST overlap, and frontend/backend cannot be separated?" in a broader sense, with 3 different components, then there times when the 3 layers architecture is useless of course. Think of a simple personal blog, you'll not need to pull external data or poll RabbitMQ queues.
Here is a real world example which shows front/mid/back end.
General description:
Frontend is responsible for presenting data to user. Please note interesting quirk that you may have two different front ends associated with single backend
Backend provides business logic/data persistence.
Middleware (activemq in the picture) is responsible for system to system. integration between backends. Usually it is installed as separate application
Overlapping:
It is possible to have overlapping between frontend and backend. This usually leaads to long-term issues with application maintenance and scalability. Fairly common in legacy applications.
Most modern technology stacks encourage developers to have strict separation. For example in the picture you can see that backend of the first system has rest web service which is a clear separation line.
Bottlenecks
Most bottlenecks in large are caused by database/network. Databases are located in backend. As for network issues every connection goes through netowrk, so every connection has potential for being slow. With good application design these issues are avoidable to large extend.
In terms of networking and security, the Backend is by far the most (should be) secure node.
The middle-end portion, usually being a web server, will be somewhat in the wild and cut off in many respects from a company's network. The middle-end node is usually placed in the DMZ and segmented from the network with firewall settings. Most of the server-side code parsing of web pages is handled on the middle-end web server.
Getting to the backend means going through the middle-end, which has a carefully crafted set of rules allowing/disallowing access to the vital nummies which are stored on the database (backend) server.
Frontend refers to the client-side, whereas backend refers to the server-side of the application. Both are crucial to web development, but their roles, responsibilities and the environments they work in are totally different. Frontend is basically what users see whereas backend is how everything works
Frontend -> these are the client side of a website from where a user can interact with the server through User Interface. generally built using Html and CSS.
Middleware -> Middleware are the software or service which is responsible for the system to communicate and manage the data. it handles the communication between components and input/output
Backend -> Backend are the server side of any application which consist of all functioning and operations performed on data. this part is considered to be most essential part of any application. Only the server admin have access to this. it mainly consist of database and servers.