How to evaluate the performance of web servers? - django

I'm planing to deploy a django powered site. But I feel confused about the choice of web servers, which includes apache, lighttpd, nginx and others.
I've read some articles about the performance of each of these choice. But it seems no one agrees. So I'm wondering why not test the performance by myself?
I can't find information about the best approach to performance testing web servers. So my questions are:
Is there any easy approach to test the performance without the production site?
Or can I have a method to simulate the heavy traffic to have a fair test?
How can I keep my test fair and close to production situation?
After the test, I want to figure out:
Why some ones say nginx has a better performance when serving static files.
The cpu and memory needs of each web server.
My best choice.

Tools like ab are commonly used towards testing how much load you can take from a battering of requests at once, alongside cacti/munin/your system monitoring tool or choice you can generate data on system load & requests/sec. The problem with this is many people benchmarking don't realise that they need to request a lot of different requests, as different parts of your code executes it will take varying amounts of time. Profiling and benchmarking code and not requests is also important, to which plenty of folk have already done so for django, benchrun is also not a bad tool either.
The other issue, is how many HTTP requests each page view takes. The less amount of requests, and the quicker they can be processed is the key to having websites that can sustain a high amount of traffic, as the quicker you can finish and close connections, the quicker you allocate resources for new ones.
In terms of general speed of web servers, it goes without saying that a proxy server (running reverse at your end) will always perform faster than a webserver with static content. As for Apache vs nginx in regards to your django app, it seems that mod_python is indeed faster than nginx/lighty + FastCGI but that's no surprise because CGI, regardless of any speed ups is still slow. Executing and caching code at the webserver and letting it manage it is always faster (mod_perl vs use CGI, mod_php vs CGI, etc) if you do it right.

Apache JMeter is an excellent tool for stress-testing web applications. It can be used with any web server, not just Apache.

You need to set up the web server + website of your choice on a machine somewhere, preferably a physical machine with similar hardware specs to the one you will eventually be deploying to.
You then need to use a load testing framework, for example The Grinder (free), to simulate many users using your site at the same time.
The load testing framework should be on separate machine(s) and you should monitor the network and CPU usage of those machines as well to make sure that the limiting factor of your testing is in fact the web server and not your load injectors.
Other than that its just about altering the content and monitoring response times, throughput, memory and CPU use etc... to see how they change depending on what web server you use and what sort of content you are hosting.

Related

Sitecore performance enhancements

We need our Sitecore web application to process 60-80 web requests per second. We are using Sitecore 7.0. We have tried a 1 Webserver + 1 Database server deployment, but it only processes 20-25 requests per second. Web server queues up all the other requests in the Memory. As we increase the load, memory fills up.(We did all Sitecore performance enhancements recommended). We need 4X performance to reach the goal :).
Will it be possible to achieve this goal by upgrading the existing server, or do we have to add more web servers in production environment.
Note: We are using Lucene indexing as well.
Here are some things you can consider without changing overall architecture of your deployment
CDN to offload media and static asset requests
This leaves your content delivery server available to handle important content queries and display logic.
Example www.cloudflare.com
Configure and use Sitecore's built-in caching
This is from the guide:
Investigation and configuration of the Sitecore Caches is broken down
into multiple tasks. This way each task is more focused and
simplified. The focus is on configuration and tuning of the Sitecore
Database Caches (prefetch, data, and item caches.)
For configuration
of the output rendering caching properties, the customer should be
made aware of both the Sitecore Cache Configuration Reference and the
Sitecore Presentation Component Reference as to how properly enable
and the properties to expire these caches.
Check out the Sitecore Tuning Guide
Find Slow Queries or Controls
It sounds like your application follows Sitecore best practices, but I leave this note in for anyone that might find this answer. Use Sitecore's built-in Debug mode to identify the slowest running controls and sublayouts. Additionally, if you have Analytics set up there is a "Slow Pages" report that might give you some information on where your application is slowing down.
Those things being said, if you're prepared to provision additional servers and set up a load-balanced environment then read on.
Separate Content Delivery and Content Management
To me the first logical step before load-balancing content delivery servers is to separate the content management from the equation. This is pretty easy and the Scaling Guide walks you through getting the HistoryEngine set up to keep those Lucene indexes up to date.
Set up Load Balancer with 2 or more Content Delivery servers
Once you've done the first step this can be as easy as cloning your content delivery server and adding it to your load balancer "pool". There are a couple of things to consider here like: Does your web application allow users to log in? So you'll need to worry about sticky sessions or machine keys. Does your web application use file media instead of blob media? I haven't had to deal with this, but I understand that's another consideration.
Scale your SQL solution
I've seen applications with up to four load balanced content delivery servers and the SQL Server did not have a problem - I think this will be unique to each case depending on a lot of factors: horsepower and tuning of SQL Server, content model of your application, complexity of your queries, caching configuration on content delivery servers, etc. Again, the Scaling Guide covers SQL Mirroring and Failover, so that is going to be your first stop on getting that going.
Finally, I would say contact Sitecore. These guys have probably seen more of what's gone right and what's gone wrong with installations and could get you on the right path. Good luck!
This answer written from a Sitecore developer perspective:
Bottom line: You need to figure out exactly where your performance bottleneck is. That is going to take some digging, but will be very worthwhile. You should definitely be able to serve 60-80 requests/s without any trouble... but of course that makes a lot of assumptions about the nature of your site and the requests.
For my site, I found Sitecore's caching implementation to be sub-par... I created some very simple and aggressive application-specific caches in my app and this made all the difference in the world. For instance, we have 900+ "Partner" items where our sites' advertisements live... and simply putting all these objects in an array in the Application object sped up page requests significantly. Finding an object in a Hashtable indexed by its Item.Name or ID is going to be a lot faster than Sitecore.Context.Database.GetItem("/itempath") or a SelectItems() call (at least, that's my experience). If your architecture and data set will allow this strategy, we've had good experience with it.
Another thing to watch out for is XSLT renderings. Personally, I avoid them completely in favor of ASP.NET UserControls. The XSLT rendering is just slow. As much as 10x slower than a native UserControl rendering the same HTML. So if you have a few of these... replace with some custom code and you'll see a world of difference.

How to prepare Django for a possible slashdotting?

I would like to prepare my website for a possible influx in traffic. This is my first time using Django as a framework, so I'm unsure of the modifications that should be made to assure that I'm ready and won't go down. What are some of the common things one can do to prepare a Django website for production-level traffic?
I'm also wondering what to expect in terms of traffic numbers. I'm currently hosted at Webfaction with 600GB/month of traffic. Will this quickly run out? Are there statistics on how big 'slashdotted' events are?
Use memcache and caching middleware.
Be sure to offload serving statics.
Use CDN for statics. This doesn't directly affect Django, but will reduce your network traffic.
Anything beyond that — read up what others are using:
Scaling Django Web Apps By Mike Malone
Instagram Architecture
DISQUS Architecture
Since you are at Webfaction you have an easy answer for handling your statics:
Create a Static-only application. (Not the Static CGI/PHP app)
Add it under you current website.
Put all of your statics under it (or symlink to them, which is what I do).
This will serve all statics through their nginx frontend -- blindingly fast.
Regarding your bandwidth allocation:
You don't say what type of content you are offering. If it is anything even slightly vanilla you are unlikely to approach 600GB/mo. I have one customer who offers adult-oriented videos teaching tantric sex techniques and their video bandwidth (for both free & member-only videos) is about 400-450GB/mo. The HTML portion of the site (with tons of images) runs about 50-60GB/mo.

finding out why a webapp is slow when hosted

I have a django web app that uses postgres db.It allows users to login and make some posts which get saved to db and later on the user can list how many posts he made on a particular day etc and list the posts belonging to a particular category etc.While this worked without any delay in my machine,it is taking a lot of time to load each page when hosted on a free host.
How do you find out why this is happening?which part of the app should I look first?Is there any meaning in using a profiler since this app used to run with no delays in my local machine?
I would like to find out how to approach this problem in general.I was able to access other apps hosted on the same free host without much delays ..so this may be a problem specific to my app
I would like some advice regarding this..If anyone can help..
thank you
p.s:(I intentionally left out the host's name because ,since that was a free service ,there was no point in complaining and also other apps on the same host works well)
The here is the free host bit, when on a free host you could be sharing a box with hundreds of other sites (that can equate to a very small amount of ram or CPU). Pay a little money, ($30 dollars / £22 a year) and get your self a better host.
You will find the performance and reliability so much better.
Failing that I would firstly find out what the latency between you and the server is, on a local machine there is no / little network traffic so your pages will appear to load a lot faster.
Next i would look at the actual download speeds you are getting. It could be that your site is limited to 20-30k, which means even a small site will take over a second to load.
Are you hosting many images? If this is the case are you serving them through django or is the webserver doing this. If it is django then make the webserver take this load.
Finally check the processing speed of the pages. Analise the queries which are being run and find out what is taking the time. Make sure that postgres is correctly configured and has enough resources. You can analyses the query speed using the django debug toolbar.

Django Performance Tuning Tips?

How do you tune Django for better performance? Is there some guide? I have the following questions:
Is mod_wsgi the best solution?
Is there some opcode cache like in PHP?
How should I tune Apache?
How can I set up my models, so I have fewer/faster queries?
Can I use Memcache?
Comments on a few of your questions:
Is mod_wsgi the best solution?
Apache/mod_wsgi is adequate for most people because they will never have enough traffic to cause problems even if Apache hasn't been set up properly. The web server is generally never the bottleneck.
Is there some opcode cache like in PHP?
Python caches compiled code in memory and the processes persist across requests. You thus don't need a separate opcode caching product like PHP as that is what Python does by default. You just need to ensure you aren't using a hosting mechanism or configuration that would cause the processes to be thrown away on every request or too often. Don't use CGI for example.
How should I tune Apache?
Without knowing anything about your application or the system you are hosting it on one can't give easy guidance as how you need to set up Apache. This is because throughput, duration of requests, amount of memory used, amount of memory available, number of processors and much much more come into play. If you haven't even written your application yet then you are simply jumping the gun here because until you know more about your application and production load you can't optimally tune Apache.
A few simple suggestions though.
Don't host PHP in same Apache.
Use Apache worker MPM.
Use mod_wsgi daemon mode and NOT embedded mode.
This alone will save you from causing too much grief for yourself to begin with.
If you are genuinely needing to better tune your complete stack, ie., application and web server, and not just prematurely optimising because you think you are going to have the next FaceBook even though you haven't really written any code yet, then you need to start looking at performance monitoring tools to work out what your application is doing. Your application and database are going to be the real source of your problems and not the web server.
The sort of performance monitoring tool I am talking about is something like New Relic. Even then though, if you are very early days and haven't got anything deployed even, then that itself would be a waste of time. In other words, just get your code working first before worrying about how to run it optimally.

Django redundancy and replication over two VPS accounts

I'm slowly getting into the position where one of my Django sites needs some robustness behind it. I'd currently running on a single VPS on a SQLite database with memcached.. It's about as un-scaled as things can get.
If I bought another VPS account, what would I want to do?
Move to MySQL/PostgreSQL with replication? What's easiest? Does replication protect me from one server exploding? Are there concurrency downsides?
How do I load-balance between the two servers?
I'd put memcached on the new server too. If I put both IPs into the configuration, would that keep a copy of data on both servers? (I'm thinking of what happens to session data - currently stored in memcached)
I'm currently using Cherokee as the httpd - I'm sure this has its own set of issues. If you've any tips, let me know.
Am I going at this the wrong way? Is there an easier way to have faster, more robust django sites?
First step: switch from SQLite to a real production database (I like Postgres). This should happen long before you even think about a second VPS. SQLite essentially does not support concurrency at all. Personally, I wouldn't even consider deploying a live site on SQLite in the first place.
If your site is running on SQLite and is functioning, my guess is you are still quite a long ways from actually outgrowing your single VPS (unless it's already heavily loaded otherwise).
If/when you do need to add a second server, how you configure things depends on where you're actually seeing a bottleneck. Chances are it'll be the database, in which case a good step might be simply moving the database onto its own server (presuming you can guarantee low latency between the two VPSes) and loading the database server with as much RAM as you can afford. In general disk performance suffers most in a VPS, so another step to consider might be putting the DB onto raw metal.
I'd probably look at those steps before I'd think about DB replication or multiple web-tier servers, but it really depends on profiling your actual case (and how you value performance vs reliability).
Watching the Django Deployment Workshop by Jacob Kaplan-Moss should give you a good overview.
MySQL supports Master-Slave and Master-Master setups I don't use PostgreSQL.
You can use nginx as your loadbalancer, HAProxy is an option, too (SO use it).
Memcached distributes the objects over the servers, If one crashes the data is lost.
I don't know Cherokee, but nginx is great.