How to prepare Django for a possible slashdotting?

How to prepare Django for a possible slashdotting? - django

I would like to prepare my website for a possible influx in traffic. This is my first time using Django as a framework, so I'm unsure of the modifications that should be made to assure that I'm ready and won't go down. What are some of the common things one can do to prepare a Django website for production-level traffic?
I'm also wondering what to expect in terms of traffic numbers. I'm currently hosted at Webfaction with 600GB/month of traffic. Will this quickly run out? Are there statistics on how big 'slashdotted' events are?

Use memcache and caching middleware.
Be sure to offload serving statics.
Use CDN for statics. This doesn't directly affect Django, but will reduce your network traffic.
Anything beyond that — read up what others are using:
Scaling Django Web Apps By Mike Malone
Instagram Architecture
DISQUS Architecture

Since you are at Webfaction you have an easy answer for handling your statics:
Create a Static-only application. (Not the Static CGI/PHP app)
Add it under you current website.
Put all of your statics under it (or symlink to them, which is what I do).
This will serve all statics through their nginx frontend -- blindingly fast.
Regarding your bandwidth allocation:
You don't say what type of content you are offering. If it is anything even slightly vanilla you are unlikely to approach 600GB/mo. I have one customer who offers adult-oriented videos teaching tantric sex techniques and their video bandwidth (for both free & member-only videos) is about 400-450GB/mo. The HTML portion of the site (with tons of images) runs about 50-60GB/mo.

Related

Do I need to use a caching technology like memcached or redis?

I am new to web development in general.
I am developing a social media website (very much like twitter) using django rest framework as backend and react as front end. I am going to deploy the app to Heroku.
Now, I just heard about this thing called memcached and redis. So, what is the usecase here? Should I use it or It's just for high traffic websites?

Cache in generally called in-memory cache, which store data primarily in memory(like memcached and Redis), and will provide faster way for data access in heavy traffic case.
And Cache-database consistency is always been an issue as you do have multiple different data sources. There are some good solutions to improve it but it still not perfect in sync.
So based on your read/write traffic, if db can handle the traffic perfectly and no performance issue, you don't need to consider cache(most of the productive database also have caching, like MySQL, or DynamoDB). And if db cannot handle your traffic, you should consider using cache.

What does it take to scale Django? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Problem
So I've been Django-ing for a number of months*. I find myself in a position where, I'm able to code up a Django web app for whatever, but am terrified by my inability** to come up with solutions as to how to go about building a Django web app for a large (LARGE) audience. Good to know that Django scales, at least.
How I'm thinking about it
It seems like there would need to be a relatively large leap of knowledge to understand how to (let alone actually execute) scale a Django web app. I say this because my research has given me the impression that scaling (or, enabling scalability) is a process of fitting aftermarket solutions to the different components of your web app to enhance the performance of each of these components.
There'sjustsomanythings~~
So there's a ton of solutions, and a bunch of components. For instance, there's Elastic Beanstalk for hosting, Django's cache framework, Memcached and Varnish for caching, Cassandra, Redis and PostgreSQL for databases, and uWSGI, Nginx and Apache for deployment. If what I think is right, anyway. I'm still not sure.
What I Need
I crave that amazing response that becomes the canonical answer to the question, but would also appreciate leads on where to begin, or suggestions of an approach to take to solve the problem, or your approach to scale Django. Thank you in advance for your been-there-done-that words of wisdom. << Edit: SO disapproves :(
BRAND NEW & EXCLUSIVE: A QUESTION FOR STACKOVERFLOW!
What I need
What are the 3 most important/effective things I should do/implement to improve the preparedness for scaling of the Django web apps that I'm building? List the approach, and explaining how they help would be nice.
*I've been cheating. I deploy on Pythonanywhere and have only used Sqlite3 up till now. I have also managed to keep my hands clean of WSGI/Apache deployment stuff to date.
**With Django is when I first managed to create something of value through programming. Before, I had only used Pascal to cheat at Runescape and Java to make some shitty Android apps. Which could perhaps explain why I feel this is that large of a leap.

I really wouldn't worry too much about it initially. That said, here are some ideas for how you might want to think about scaling your Django apps.
Caching
Depending on what your application is, caching can be very useful indeed. Certainly for any application that has a high proportion of reads to writes, such as a blog or content management system, then implementing caching is a no-brainer. For other types of sites, you may have to be a bit more careful, however the Django caching framework makes it straightforward to customise how caching works for your application.
Memcached is easy to set up with the Django caching, and it's solid and reliable. It should probably be your default choice as the caching backend.
 Celery
If your web app does any appreciable number of tasks in the background that need not be done during the same HTTP requests, then you should consider using Celery to carry them out in a separate task.
Case in point: on a Django app I built, there was the option to send an email to a client with a PDF copy of a report attached. Because the email need not be sent within the same HTTP request, then I handed that task off to Celery. Now, when the app receives the HTTP request, it just pushes the request to send that email onto the messaging queue. The Celery process picks up this task and handles it separately.
In theory that task could be handled on an entirely separate machine when your web app gets big enough.
Web server
It seems to be generally accepted that serving static content and dynamic content with Django is a bad idea. The solution I use seems to be fairly typical and employs two web servers:
Nginx runs on port 80. It serves all the static files and reverse proxies everything else to another port
Gunicorn runs on that other port and it serves the dynamic content, and Supervisor is used to run the Gunicorn process
There are variants of this general idea, but this kind of two server approach seems to be common. You could also consider using something like Amazon's S3 to host static files.
It's also well worth your while taking the time to minify your static files to improve their performance. Using a tool like Grunt it's quite easy to concatenate and minify your JavaScript and CSS files so that only one of each need be downloaded, rather than including many files that need to be downloaded individually.
Database
Either MySQL or Postgresql will be fine. Both are solid databases that are used in production on many websites.
As I said higher up, scaling your app shouldn't really be too much of a concern early on. However, it helps to be familiar with the kind of strategies you'll need to use.

Sitecore performance enhancements

We need our Sitecore web application to process 60-80 web requests per second. We are using Sitecore 7.0. We have tried a 1 Webserver + 1 Database server deployment, but it only processes 20-25 requests per second. Web server queues up all the other requests in the Memory. As we increase the load, memory fills up.(We did all Sitecore performance enhancements recommended). We need 4X performance to reach the goal :).
Will it be possible to achieve this goal by upgrading the existing server, or do we have to add more web servers in production environment.
Note: We are using Lucene indexing as well.

Here are some things you can consider without changing overall architecture of your deployment
CDN to offload media and static asset requests
This leaves your content delivery server available to handle important content queries and display logic.
Example www.cloudflare.com
Configure and use Sitecore's built-in caching
This is from the guide:
Investigation and configuration of the Sitecore Caches is broken down
into multiple tasks. This way each task is more focused and
simplified. The focus is on configuration and tuning of the Sitecore
Database Caches (prefetch, data, and item caches.)
For configuration
of the output rendering caching properties, the customer should be
made aware of both the Sitecore Cache Configuration Reference and the
Sitecore Presentation Component Reference as to how properly enable
and the properties to expire these caches.
Check out the Sitecore Tuning Guide
Find Slow Queries or Controls
It sounds like your application follows Sitecore best practices, but I leave this note in for anyone that might find this answer. Use Sitecore's built-in Debug mode to identify the slowest running controls and sublayouts. Additionally, if you have Analytics set up there is a "Slow Pages" report that might give you some information on where your application is slowing down.
Those things being said, if you're prepared to provision additional servers and set up a load-balanced environment then read on.
Separate Content Delivery and Content Management
To me the first logical step before load-balancing content delivery servers is to separate the content management from the equation. This is pretty easy and the Scaling Guide walks you through getting the HistoryEngine set up to keep those Lucene indexes up to date.
Set up Load Balancer with 2 or more Content Delivery servers
Once you've done the first step this can be as easy as cloning your content delivery server and adding it to your load balancer "pool". There are a couple of things to consider here like: Does your web application allow users to log in? So you'll need to worry about sticky sessions or machine keys. Does your web application use file media instead of blob media? I haven't had to deal with this, but I understand that's another consideration.
Scale your SQL solution
I've seen applications with up to four load balanced content delivery servers and the SQL Server did not have a problem - I think this will be unique to each case depending on a lot of factors: horsepower and tuning of SQL Server, content model of your application, complexity of your queries, caching configuration on content delivery servers, etc. Again, the Scaling Guide covers SQL Mirroring and Failover, so that is going to be your first stop on getting that going.
Finally, I would say contact Sitecore. These guys have probably seen more of what's gone right and what's gone wrong with installations and could get you on the right path. Good luck!

This answer written from a Sitecore developer perspective:
Bottom line: You need to figure out exactly where your performance bottleneck is. That is going to take some digging, but will be very worthwhile. You should definitely be able to serve 60-80 requests/s without any trouble... but of course that makes a lot of assumptions about the nature of your site and the requests.
For my site, I found Sitecore's caching implementation to be sub-par... I created some very simple and aggressive application-specific caches in my app and this made all the difference in the world. For instance, we have 900+ "Partner" items where our sites' advertisements live... and simply putting all these objects in an array in the Application object sped up page requests significantly. Finding an object in a Hashtable indexed by its Item.Name or ID is going to be a lot faster than Sitecore.Context.Database.GetItem("/itempath") or a SelectItems() call (at least, that's my experience). If your architecture and data set will allow this strategy, we've had good experience with it.
Another thing to watch out for is XSLT renderings. Personally, I avoid them completely in favor of ASP.NET UserControls. The XSLT rendering is just slow. As much as 10x slower than a native UserControl rendering the same HTML. So if you have a few of these... replace with some custom code and you'll see a world of difference.

How to evaluate the performance of web servers?

I'm planing to deploy a django powered site. But I feel confused about the choice of web servers, which includes apache, lighttpd, nginx and others.
I've read some articles about the performance of each of these choice. But it seems no one agrees. So I'm wondering why not test the performance by myself?
I can't find information about the best approach to performance testing web servers. So my questions are:
Is there any easy approach to test the performance without the production site?
Or can I have a method to simulate the heavy traffic to have a fair test?
How can I keep my test fair and close to production situation?
After the test, I want to figure out:
Why some ones say nginx has a better performance when serving static files.
The cpu and memory needs of each web server.
My best choice.

Tools like ab are commonly used towards testing how much load you can take from a battering of requests at once, alongside cacti/munin/your system monitoring tool or choice you can generate data on system load & requests/sec. The problem with this is many people benchmarking don't realise that they need to request a lot of different requests, as different parts of your code executes it will take varying amounts of time. Profiling and benchmarking code and not requests is also important, to which plenty of folk have already done so for django, benchrun is also not a bad tool either.
The other issue, is how many HTTP requests each page view takes. The less amount of requests, and the quicker they can be processed is the key to having websites that can sustain a high amount of traffic, as the quicker you can finish and close connections, the quicker you allocate resources for new ones.
In terms of general speed of web servers, it goes without saying that a proxy server (running reverse at your end) will always perform faster than a webserver with static content. As for Apache vs nginx in regards to your django app, it seems that mod_python is indeed faster than nginx/lighty + FastCGI but that's no surprise because CGI, regardless of any speed ups is still slow. Executing and caching code at the webserver and letting it manage it is always faster (mod_perl vs use CGI, mod_php vs CGI, etc) if you do it right.

Apache JMeter is an excellent tool for stress-testing web applications. It can be used with any web server, not just Apache.

You need to set up the web server + website of your choice on a machine somewhere, preferably a physical machine with similar hardware specs to the one you will eventually be deploying to.
You then need to use a load testing framework, for example The Grinder (free), to simulate many users using your site at the same time.
The load testing framework should be on separate machine(s) and you should monitor the network and CPU usage of those machines as well to make sure that the limiting factor of your testing is in fact the web server and not your load injectors.
Other than that its just about altering the content and monitoring response times, throughput, memory and CPU use etc... to see how they change depending on what web server you use and what sort of content you are hosting.

I've got a django site with a good deal of javascript but my clients have terrible connectivity - how to optimize?

We're hosting a django service for some clients using really really poor and intermittent connectivity. Satellite and GPRS connectivity in parts of Africa that haven't benefited from the recent fiber cables making landfall.
I've consolidated the javascripts and used minificatied versions, tried to clean up the stylesheets, and what not...
Like a good django implementer, I'm letting apache serve up all the static information like css and JS and other static media. I've enabled apache modules deflate (for gzip) and expired to try to minimize retransmission of the javascript packages (mainly jQuery's huge cost). I've also enabled django's gzip middleware (but that doesn't seem to do much in combination with apache's deflate).
Main question - what else is there to do to optimize bandwidth utilization?
Are there django optimizations in headers or what not to make sure that "already seen data" will not travel over the network?
The django caching framework seems to be tailored towards server optimization (minimize hitting the database) - how does that translate to actual bandwidth utilization?
what other tweaks on apache are there to make sure that the browser won't try to get data it already has?

Some of your optimizations are important for wringing better performance out of your server, but don't confuse them with optimizing bandwidth utilization. In other words gzip/deflate are relevant but Apache serving static content is not (even though it is important).
Now, for your problem you need to look at three things: how much data is being sent, how many connections are required to get the data, and how good are the connections.
You mostly have the first area covered by using deflate/gzip, expires, minimization of javascript etc. so I can only add one or two things you might not know about. First, you should upgrade to Django 1.1, if you haven't already, because it has better support for ETags/Expires headers for your Django views. You probably already have those headers working properly for static data from Apache but if you're using older Django they (probably) aren't being set properly on your dynamic views.
For the next area, number of connections, you need to consolidate your javascript and css files into as few files as possible to reduce the number of connections. Also very helpful can be consolidating your image files into a single "sprite" image. There are a few Django projects to handle this aspect: django-compress, django-media-bundler (which is the only one that will create image sprites), and you can also see this SO answer.
For the last area of how good are the connections you should look at global CDN as suggested by Alex, or at the very least host your site at an ISP closer to your users. This could be tough for Africa, which in my experience can't even get decent connectivity into European ISP's (at least southern Africa... northern Africa might be better).

You could delegate jQuery to a CDN which may have better connectivity with Africa, e.g., google (and, it's a free service!-). Beyond that I recommend anything every written (or spoken on video, there's a lot of that!-) by Steve Souders -- while his talks and books and essays are invaluable to EVERY web developer I think they're particularly precious to ones serving a low-bandwidth audience (e.g., one of his tips in his latest books and talks is about a substantial fraction of the world's browsers NOT getting compression benefits from deflate or gzip -- it's not so much about the browsers themselves, but about proxies and firewalls doing things wrong, so "manual compression" is STILL important then!).

This is definitely not an area I've had a lot of experience in, but looking into Django's ConditionalGetMiddleware might prove useful. I think it might help you solve the first of your bullet points.
EDIT: This might be a good place to start: http://docs.djangoproject.com/en/dev/topics/conditional-view-processing/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js