Django Performance / Memory usage [closed] - django

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am running an alpha version of my app on a EC2 Small instance (1.7 GB RAM) with postgres and apache (wsgi-mod not as daemon but directly) on it.
Performance is alright, but it could be better. I am also worried about memory usage if too many test users would join.
Is it wise to switch from Apache to nginx server? Has any Django developer done that and is happier with the results? Any other tips on the way are also welcome.
Thanks

We are using nginx together with our Django app in a gunicorn server. The performance is quite good so far, but I have not done any direct comparisons with an Apache setup. Memory usage is quite small, nginx takes about 10MB memory and gunicorn about 150MB (but it also servers more than one app). Of course this may vary from app to app.
I would suggest to simply give it a try, it should be quite easy to set up following some tutorials on the web and/or on the gunicorn website. Also get some comparable test case and use some kind of monitoring software like munin to see changes over time.

Why aren't you using daemon mode of mod_wsgi? If you are using embedded mode you are setting yourself up for memory issues if you aren't careful with how you set up Apache.
Go have a read of:
http://blog.dscpl.com.au/2012/10/why-are-you-using-embedded-mode-of.html
and also watch my PyCon talk at:
http://lanyrd.com/2012/pycon/spcdg/
Also amend your question and indicate which Apache MPM you are using and what the MPM settings are.
As to using alternatives such as gunicorn or uWSGI, for a comparable configuration, the memory requirements aren't doing to be much different as the underlying server isn't going to be what dictates how much memory is used, it is going to be your specific Python web application running on top of it. It is a common misconception that gunicorn or uWSGI somehow magically solves all the problems and that Apache can't do as well. Set Apache up properly for a Python web application and don't rely on its defaults and it is just as capable as other solutions and can provide a lot more flexibility depending on your requirements.
Very much suggest you get in place some monitoring to work out what the real issues and bottlenecks are.

I have mixed results. When the app is fast, non-blocking, nginx performs well with a smaller memory footprint. The benefit is bigger with a higher traffic.
I have a couple GIS applications that are a bit slower, in this context nginx fails miserably. My advice is: don't use nginx + wsgi on anything that can block for a few seconds.

Related

What does it take to scale Django? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Problem
So I've been Django-ing for a number of months*. I find myself in a position where, I'm able to code up a Django web app for whatever, but am terrified by my inability** to come up with solutions as to how to go about building a Django web app for a large (LARGE) audience. Good to know that Django scales, at least.
How I'm thinking about it
It seems like there would need to be a relatively large leap of knowledge to understand how to (let alone actually execute) scale a Django web app. I say this because my research has given me the impression that scaling (or, enabling scalability) is a process of fitting aftermarket solutions to the different components of your web app to enhance the performance of each of these components.
There'sjustsomanythings~~
So there's a ton of solutions, and a bunch of components. For instance, there's Elastic Beanstalk for hosting, Django's cache framework, Memcached and Varnish for caching, Cassandra, Redis and PostgreSQL for databases, and uWSGI, Nginx and Apache for deployment. If what I think is right, anyway. I'm still not sure.
What I Need
I crave that amazing response that becomes the canonical answer to the question, but would also appreciate leads on where to begin, or suggestions of an approach to take to solve the problem, or your approach to scale Django. Thank you in advance for your been-there-done-that words of wisdom. << Edit: SO disapproves :(
BRAND NEW & EXCLUSIVE: A QUESTION FOR STACKOVERFLOW!
What I need
What are the 3 most important/effective things I should do/implement to improve the preparedness for scaling of the Django web apps that I'm building? List the approach, and explaining how they help would be nice.
*I've been cheating. I deploy on Pythonanywhere and have only used Sqlite3 up till now. I have also managed to keep my hands clean of WSGI/Apache deployment stuff to date.
**With Django is when I first managed to create something of value through programming. Before, I had only used Pascal to cheat at Runescape and Java to make some shitty Android apps. Which could perhaps explain why I feel this is that large of a leap.
I really wouldn't worry too much about it initially. That said, here are some ideas for how you might want to think about scaling your Django apps.
Caching
Depending on what your application is, caching can be very useful indeed. Certainly for any application that has a high proportion of reads to writes, such as a blog or content management system, then implementing caching is a no-brainer. For other types of sites, you may have to be a bit more careful, however the Django caching framework makes it straightforward to customise how caching works for your application.
Memcached is easy to set up with the Django caching, and it's solid and reliable. It should probably be your default choice as the caching backend.
 Celery
If your web app does any appreciable number of tasks in the background that need not be done during the same HTTP requests, then you should consider using Celery to carry them out in a separate task.
Case in point: on a Django app I built, there was the option to send an email to a client with a PDF copy of a report attached. Because the email need not be sent within the same HTTP request, then I handed that task off to Celery. Now, when the app receives the HTTP request, it just pushes the request to send that email onto the messaging queue. The Celery process picks up this task and handles it separately.
In theory that task could be handled on an entirely separate machine when your web app gets big enough.
Web server
It seems to be generally accepted that serving static content and dynamic content with Django is a bad idea. The solution I use seems to be fairly typical and employs two web servers:
Nginx runs on port 80. It serves all the static files and reverse proxies everything else to another port
Gunicorn runs on that other port and it serves the dynamic content, and Supervisor is used to run the Gunicorn process
There are variants of this general idea, but this kind of two server approach seems to be common. You could also consider using something like Amazon's S3 to host static files.
It's also well worth your while taking the time to minify your static files to improve their performance. Using a tool like Grunt it's quite easy to concatenate and minify your JavaScript and CSS files so that only one of each need be downloaded, rather than including many files that need to be downloaded individually.
Database
Either MySQL or Postgresql will be fine. Both are solid databases that are used in production on many websites.
As I said higher up, scaling your app shouldn't really be too much of a concern early on. However, it helps to be familiar with the kind of strategies you'll need to use.

Django Performance Tuning Tips?

How do you tune Django for better performance? Is there some guide? I have the following questions:
Is mod_wsgi the best solution?
Is there some opcode cache like in PHP?
How should I tune Apache?
How can I set up my models, so I have fewer/faster queries?
Can I use Memcache?
Comments on a few of your questions:
Is mod_wsgi the best solution?
Apache/mod_wsgi is adequate for most people because they will never have enough traffic to cause problems even if Apache hasn't been set up properly. The web server is generally never the bottleneck.
Is there some opcode cache like in PHP?
Python caches compiled code in memory and the processes persist across requests. You thus don't need a separate opcode caching product like PHP as that is what Python does by default. You just need to ensure you aren't using a hosting mechanism or configuration that would cause the processes to be thrown away on every request or too often. Don't use CGI for example.
How should I tune Apache?
Without knowing anything about your application or the system you are hosting it on one can't give easy guidance as how you need to set up Apache. This is because throughput, duration of requests, amount of memory used, amount of memory available, number of processors and much much more come into play. If you haven't even written your application yet then you are simply jumping the gun here because until you know more about your application and production load you can't optimally tune Apache.
A few simple suggestions though.
Don't host PHP in same Apache.
Use Apache worker MPM.
Use mod_wsgi daemon mode and NOT embedded mode.
This alone will save you from causing too much grief for yourself to begin with.
If you are genuinely needing to better tune your complete stack, ie., application and web server, and not just prematurely optimising because you think you are going to have the next FaceBook even though you haven't really written any code yet, then you need to start looking at performance monitoring tools to work out what your application is doing. Your application and database are going to be the real source of your problems and not the web server.
The sort of performance monitoring tool I am talking about is something like New Relic. Even then though, if you are very early days and haven't got anything deployed even, then that itself would be a waste of time. In other words, just get your code working first before worrying about how to run it optimally.

Django redundancy and replication over two VPS accounts

I'm slowly getting into the position where one of my Django sites needs some robustness behind it. I'd currently running on a single VPS on a SQLite database with memcached.. It's about as un-scaled as things can get.
If I bought another VPS account, what would I want to do?
Move to MySQL/PostgreSQL with replication? What's easiest? Does replication protect me from one server exploding? Are there concurrency downsides?
How do I load-balance between the two servers?
I'd put memcached on the new server too. If I put both IPs into the configuration, would that keep a copy of data on both servers? (I'm thinking of what happens to session data - currently stored in memcached)
I'm currently using Cherokee as the httpd - I'm sure this has its own set of issues. If you've any tips, let me know.
Am I going at this the wrong way? Is there an easier way to have faster, more robust django sites?
First step: switch from SQLite to a real production database (I like Postgres). This should happen long before you even think about a second VPS. SQLite essentially does not support concurrency at all. Personally, I wouldn't even consider deploying a live site on SQLite in the first place.
If your site is running on SQLite and is functioning, my guess is you are still quite a long ways from actually outgrowing your single VPS (unless it's already heavily loaded otherwise).
If/when you do need to add a second server, how you configure things depends on where you're actually seeing a bottleneck. Chances are it'll be the database, in which case a good step might be simply moving the database onto its own server (presuming you can guarantee low latency between the two VPSes) and loading the database server with as much RAM as you can afford. In general disk performance suffers most in a VPS, so another step to consider might be putting the DB onto raw metal.
I'd probably look at those steps before I'd think about DB replication or multiple web-tier servers, but it really depends on profiling your actual case (and how you value performance vs reliability).
Watching the Django Deployment Workshop by Jacob Kaplan-Moss should give you a good overview.
MySQL supports Master-Slave and Master-Master setups I don't use PostgreSQL.
You can use nginx as your loadbalancer, HAProxy is an option, too (SO use it).
Memcached distributes the objects over the servers, If one crashes the data is lost.
I don't know Cherokee, but nginx is great.

How to evaluate the performance of web servers?

I'm planing to deploy a django powered site. But I feel confused about the choice of web servers, which includes apache, lighttpd, nginx and others.
I've read some articles about the performance of each of these choice. But it seems no one agrees. So I'm wondering why not test the performance by myself?
I can't find information about the best approach to performance testing web servers. So my questions are:
Is there any easy approach to test the performance without the production site?
Or can I have a method to simulate the heavy traffic to have a fair test?
How can I keep my test fair and close to production situation?
After the test, I want to figure out:
Why some ones say nginx has a better performance when serving static files.
The cpu and memory needs of each web server.
My best choice.
Tools like ab are commonly used towards testing how much load you can take from a battering of requests at once, alongside cacti/munin/your system monitoring tool or choice you can generate data on system load & requests/sec. The problem with this is many people benchmarking don't realise that they need to request a lot of different requests, as different parts of your code executes it will take varying amounts of time. Profiling and benchmarking code and not requests is also important, to which plenty of folk have already done so for django, benchrun is also not a bad tool either.
The other issue, is how many HTTP requests each page view takes. The less amount of requests, and the quicker they can be processed is the key to having websites that can sustain a high amount of traffic, as the quicker you can finish and close connections, the quicker you allocate resources for new ones.
In terms of general speed of web servers, it goes without saying that a proxy server (running reverse at your end) will always perform faster than a webserver with static content. As for Apache vs nginx in regards to your django app, it seems that mod_python is indeed faster than nginx/lighty + FastCGI but that's no surprise because CGI, regardless of any speed ups is still slow. Executing and caching code at the webserver and letting it manage it is always faster (mod_perl vs use CGI, mod_php vs CGI, etc) if you do it right.
Apache JMeter is an excellent tool for stress-testing web applications. It can be used with any web server, not just Apache.
You need to set up the web server + website of your choice on a machine somewhere, preferably a physical machine with similar hardware specs to the one you will eventually be deploying to.
You then need to use a load testing framework, for example The Grinder (free), to simulate many users using your site at the same time.
The load testing framework should be on separate machine(s) and you should monitor the network and CPU usage of those machines as well to make sure that the limiting factor of your testing is in fact the web server and not your load injectors.
Other than that its just about altering the content and monitoring response times, throughput, memory and CPU use etc... to see how they change depending on what web server you use and what sort of content you are hosting.

Where to go to learn about web architecture? Youtube example? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 13 years ago.
Improve this question
I'm trying to build a web application that is similar to Youtube (it's not a knock off), but I guess I don't know how video is served on the internet very well.
I know how to build regular database driven web applications, but nothing like the scalability of Youtube. All of the applications I have built before have all been run on one server with the files stored on the same box as the web server.
How does one decouple the application server from the file storage from the media server?
I would more or less want 4 machines (clusters of machines)
1.) Application servers
-- Present the web page, handle user uploads, link the user's flash player to the correct media server etc.
2.) Database shards
-- Store user information, check favorites, etc.
3.) File storage
-- Store the media files
4.) Media servers
-- Serve the media files
How do I hook all of this together? Which technologies should I leverage? Where do I go to learn more about architecting this?
How does Youtube's embeddable flash stuff work? I want to embed my flash player on other websites and have it tie into my architecture.
Note I have looked into: http://highscalability.com/youtube-architecture
But I still don't get the overall picture of how this stuff ties together.
If someone can explain in high level terms how all of this stuff works?
Are there dedicated client servers running internally to shuffle around all of this stuff between the application servers, file storage, etc. Is it all via HTTP using JSON, what is going on here!
Thanks
Two books I'd recommend are:
Scalable Internet Architectures
Building Scalable Web Sites
The latter is by the director of engineering at flickr. Not youtube, but I think you'll find it enlightening.
Beyond that, the High Scalability blog is a good source of case studies and collected wisdom, all of which provide a good starting point for further exploration.
Start by hiring the right people; if you hire smart people, they'll be able to come up with answers to these questions, and more which will crop up.
Also, start at the scale that you plan to initially operate at. Don't plan for scalability you don't need. You aren't going to be making another Youtube - even if you're very successful within your field.
Scalability is expensive - very expensive - to develop and maintain. If you don't need it, it will drain your resources and restrict your developers needlessly. Just building a credible test environment for high performance systems tends to be a big job, and such a system would require several such environments.