What does it take to scale Django? [closed] - django

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Problem
So I've been Django-ing for a number of months*. I find myself in a position where, I'm able to code up a Django web app for whatever, but am terrified by my inability** to come up with solutions as to how to go about building a Django web app for a large (LARGE) audience. Good to know that Django scales, at least.
How I'm thinking about it
It seems like there would need to be a relatively large leap of knowledge to understand how to (let alone actually execute) scale a Django web app. I say this because my research has given me the impression that scaling (or, enabling scalability) is a process of fitting aftermarket solutions to the different components of your web app to enhance the performance of each of these components.
There'sjustsomanythings~~
So there's a ton of solutions, and a bunch of components. For instance, there's Elastic Beanstalk for hosting, Django's cache framework, Memcached and Varnish for caching, Cassandra, Redis and PostgreSQL for databases, and uWSGI, Nginx and Apache for deployment. If what I think is right, anyway. I'm still not sure.
What I Need
I crave that amazing response that becomes the canonical answer to the question, but would also appreciate leads on where to begin, or suggestions of an approach to take to solve the problem, or your approach to scale Django. Thank you in advance for your been-there-done-that words of wisdom. << Edit: SO disapproves :(
BRAND NEW & EXCLUSIVE: A QUESTION FOR STACKOVERFLOW!
What I need
What are the 3 most important/effective things I should do/implement to improve the preparedness for scaling of the Django web apps that I'm building? List the approach, and explaining how they help would be nice.
*I've been cheating. I deploy on Pythonanywhere and have only used Sqlite3 up till now. I have also managed to keep my hands clean of WSGI/Apache deployment stuff to date.
**With Django is when I first managed to create something of value through programming. Before, I had only used Pascal to cheat at Runescape and Java to make some shitty Android apps. Which could perhaps explain why I feel this is that large of a leap.

I really wouldn't worry too much about it initially. That said, here are some ideas for how you might want to think about scaling your Django apps.
Caching
Depending on what your application is, caching can be very useful indeed. Certainly for any application that has a high proportion of reads to writes, such as a blog or content management system, then implementing caching is a no-brainer. For other types of sites, you may have to be a bit more careful, however the Django caching framework makes it straightforward to customise how caching works for your application.
Memcached is easy to set up with the Django caching, and it's solid and reliable. It should probably be your default choice as the caching backend.
 Celery
If your web app does any appreciable number of tasks in the background that need not be done during the same HTTP requests, then you should consider using Celery to carry them out in a separate task.
Case in point: on a Django app I built, there was the option to send an email to a client with a PDF copy of a report attached. Because the email need not be sent within the same HTTP request, then I handed that task off to Celery. Now, when the app receives the HTTP request, it just pushes the request to send that email onto the messaging queue. The Celery process picks up this task and handles it separately.
In theory that task could be handled on an entirely separate machine when your web app gets big enough.
Web server
It seems to be generally accepted that serving static content and dynamic content with Django is a bad idea. The solution I use seems to be fairly typical and employs two web servers:
Nginx runs on port 80. It serves all the static files and reverse proxies everything else to another port
Gunicorn runs on that other port and it serves the dynamic content, and Supervisor is used to run the Gunicorn process
There are variants of this general idea, but this kind of two server approach seems to be common. You could also consider using something like Amazon's S3 to host static files.
It's also well worth your while taking the time to minify your static files to improve their performance. Using a tool like Grunt it's quite easy to concatenate and minify your JavaScript and CSS files so that only one of each need be downloaded, rather than including many files that need to be downloaded individually.
Database
Either MySQL or Postgresql will be fine. Both are solid databases that are used in production on many websites.
As I said higher up, scaling your app shouldn't really be too much of a concern early on. However, it helps to be familiar with the kind of strategies you'll need to use.

Related

Django Performance / Memory usage [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am running an alpha version of my app on a EC2 Small instance (1.7 GB RAM) with postgres and apache (wsgi-mod not as daemon but directly) on it.
Performance is alright, but it could be better. I am also worried about memory usage if too many test users would join.
Is it wise to switch from Apache to nginx server? Has any Django developer done that and is happier with the results? Any other tips on the way are also welcome.
Thanks
We are using nginx together with our Django app in a gunicorn server. The performance is quite good so far, but I have not done any direct comparisons with an Apache setup. Memory usage is quite small, nginx takes about 10MB memory and gunicorn about 150MB (but it also servers more than one app). Of course this may vary from app to app.
I would suggest to simply give it a try, it should be quite easy to set up following some tutorials on the web and/or on the gunicorn website. Also get some comparable test case and use some kind of monitoring software like munin to see changes over time.
Why aren't you using daemon mode of mod_wsgi? If you are using embedded mode you are setting yourself up for memory issues if you aren't careful with how you set up Apache.
Go have a read of:
http://blog.dscpl.com.au/2012/10/why-are-you-using-embedded-mode-of.html
and also watch my PyCon talk at:
http://lanyrd.com/2012/pycon/spcdg/
Also amend your question and indicate which Apache MPM you are using and what the MPM settings are.
As to using alternatives such as gunicorn or uWSGI, for a comparable configuration, the memory requirements aren't doing to be much different as the underlying server isn't going to be what dictates how much memory is used, it is going to be your specific Python web application running on top of it. It is a common misconception that gunicorn or uWSGI somehow magically solves all the problems and that Apache can't do as well. Set Apache up properly for a Python web application and don't rely on its defaults and it is just as capable as other solutions and can provide a lot more flexibility depending on your requirements.
Very much suggest you get in place some monitoring to work out what the real issues and bottlenecks are.
I have mixed results. When the app is fast, non-blocking, nginx performs well with a smaller memory footprint. The benefit is bigger with a higher traffic.
I have a couple GIS applications that are a bit slower, in this context nginx fails miserably. My advice is: don't use nginx + wsgi on anything that can block for a few seconds.

How to prepare Django for a possible slashdotting?

I would like to prepare my website for a possible influx in traffic. This is my first time using Django as a framework, so I'm unsure of the modifications that should be made to assure that I'm ready and won't go down. What are some of the common things one can do to prepare a Django website for production-level traffic?
I'm also wondering what to expect in terms of traffic numbers. I'm currently hosted at Webfaction with 600GB/month of traffic. Will this quickly run out? Are there statistics on how big 'slashdotted' events are?
Use memcache and caching middleware.
Be sure to offload serving statics.
Use CDN for statics. This doesn't directly affect Django, but will reduce your network traffic.
Anything beyond that — read up what others are using:
Scaling Django Web Apps By Mike Malone
Instagram Architecture
DISQUS Architecture
Since you are at Webfaction you have an easy answer for handling your statics:
Create a Static-only application. (Not the Static CGI/PHP app)
Add it under you current website.
Put all of your statics under it (or symlink to them, which is what I do).
This will serve all statics through their nginx frontend -- blindingly fast.
Regarding your bandwidth allocation:
You don't say what type of content you are offering. If it is anything even slightly vanilla you are unlikely to approach 600GB/mo. I have one customer who offers adult-oriented videos teaching tantric sex techniques and their video bandwidth (for both free & member-only videos) is about 400-450GB/mo. The HTML portion of the site (with tons of images) runs about 50-60GB/mo.

Django hosting on ep.io

is there someone who has expirience in hosting django applications on ep.io?
Waht are the pros/cons on it?
I'm currently using ep.io, I'm still in development with my app but I have an app deployed and running.
When you use a service like this you go into it knowing that it isn't going to be the perfect solution for every case. Knowing the pros and cons before hand will help set your expectations so that you aren't disappointed later on.
ep.io is still very young and I believe still in beta, and isn't available to the general public. To be totally fair to them, it is still a work in progress and some of these pros and cons may change as they roll out new features. I will try and come back and update this post as the new versions become available, and my experience with the service continues.
So far I am really pleased with what they have, they took the most annoying part of developing an application and made it better. If you have a simple blog app, it should be a breeze to deploy it, and probably not cost that much to host.
Pros:
Server Management: You don't have to worry about your server setup at all, it handles everything for you. With a VPS, you would need to worry about making sure the server is up to date with security patches, and all that fun stuff, with this, you don't worry about anything, they take care of all that for you.
deployment: It makes deploying an app and having it up and running really quickly. deploying a new version of an app is a piece of cake, I just need to run one maybe two commands, and it handles everything for me.
Pricing: you are only charged for what you use, so if you have a very low traffic website, it might not cost you anything at all.
Scaling: They handle scaling and load balancing for you out of the box, no need for you to worry about that. You still need to write your application so that it can scale efficiently, but if you do, they will handle the rest.
Background tasks: They have support for cronjobs as well as background workers using celery.
Customer support: I had a few questions, sent them an email, and had an answer really fast, they have been great, so much better then I would have expected. If you run your own VPS, you really don't have anyone to talk to, so this is a major plus.
Cons:
DB access: You don't have direct access to the database, you can get to the psql shell, but you can't connect an external client gui. This makes doing somethings a little more difficult or slow. But you can still use the django admin or fixtures to do a lot of things.
Limited services available: It currently only supports Postgresql and redis, so if you want to use MySQL, memcached, mongodb,etc you are out of luck.
low level c libs: You can't install any dependencies that you want, similar to google app engine, they have some of the common c libs installed already, and if you want something different that isn't already installed you will need to contact them to get it added. http://www.ep.io/docs/runtime/#python-libraries
email: You can't send or recieve email, which means you will need to depend on a 3rd party for that, which is probably good practice anyway, but it just means more money.
file system: You have a more limited file system available to you, and because of the distributed nature of the system you will need to be very careful when working from files. You can't (unless i missed it) connect to your account via (s)ftp to upload files, you will need to connect via the ep.io command line tool and either do an rsync or a push of a repo to get files up there.
Update: for more info see my blog post on my experiences with ep.io : http://kencochrane.net/blog/2011/04/my-experiences-with-epio/
Update: Epio closed down on May 31st 2012

Where to go to learn about web architecture? Youtube example? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 13 years ago.
Improve this question
I'm trying to build a web application that is similar to Youtube (it's not a knock off), but I guess I don't know how video is served on the internet very well.
I know how to build regular database driven web applications, but nothing like the scalability of Youtube. All of the applications I have built before have all been run on one server with the files stored on the same box as the web server.
How does one decouple the application server from the file storage from the media server?
I would more or less want 4 machines (clusters of machines)
1.) Application servers
-- Present the web page, handle user uploads, link the user's flash player to the correct media server etc.
2.) Database shards
-- Store user information, check favorites, etc.
3.) File storage
-- Store the media files
4.) Media servers
-- Serve the media files
How do I hook all of this together? Which technologies should I leverage? Where do I go to learn more about architecting this?
How does Youtube's embeddable flash stuff work? I want to embed my flash player on other websites and have it tie into my architecture.
Note I have looked into: http://highscalability.com/youtube-architecture
But I still don't get the overall picture of how this stuff ties together.
If someone can explain in high level terms how all of this stuff works?
Are there dedicated client servers running internally to shuffle around all of this stuff between the application servers, file storage, etc. Is it all via HTTP using JSON, what is going on here!
Thanks
Two books I'd recommend are:
Scalable Internet Architectures
Building Scalable Web Sites
The latter is by the director of engineering at flickr. Not youtube, but I think you'll find it enlightening.
Beyond that, the High Scalability blog is a good source of case studies and collected wisdom, all of which provide a good starting point for further exploration.
Start by hiring the right people; if you hire smart people, they'll be able to come up with answers to these questions, and more which will crop up.
Also, start at the scale that you plan to initially operate at. Don't plan for scalability you don't need. You aren't going to be making another Youtube - even if you're very successful within your field.
Scalability is expensive - very expensive - to develop and maintain. If you don't need it, it will drain your resources and restrict your developers needlessly. Just building a credible test environment for high performance systems tends to be a big job, and such a system would require several such environments.

Has anyone built web-apps that can run totally off-line? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm building an app that authors would (hopefully) use to help them, uh.. author things.
Think of it like a wiki but just for one person, but cooler. I wish to make it as accessible as possible to my (potential) adoring masses, and so I'm thinking about making it a web-app.
It certainly doesn't have to be, there is no integration with other sites, no social features. It involve typing information into forms however, so for rapid construction the web would probably be the best.
However, I don't really want to host it myself. I couldn't afford it for one, but it's mostly that people who use this may not want their data stored elsewhere. This is private information about what they are writing and I wouldn't expect them to trust me with it, and so I'm thinking about making it a thick-client app.
And therein lies the problem, how to make a application that focuses mainly on form data entry available easily to potential users (yay web apps) but also offline so they know they are in full control of their data (yay thick-client apps).
I see the following solutions:
Build it as a thick-client Java app and run a cutdown version on the net as an applet that people can play with before downloading the full thing.
Build it as a Flex app for online and an Air app for offline (same source different build scripts basically).
Build it as a standard web-app (HTML, JS etc) but have a downloadable version that somehow runs the site totally on their computer. It wouldn't touch the net at all.
Ignoring 1 and 2 (I'm looking into them separately), I think 3 would involve:
Packaging up an install that contains a tiny webserver that has my code on it, ready to run.
Remapping the DB from something like mySQL to something like SQLite.
Creating some kind of convience app that ran the server and opened your browser to the right location, possibly using something like Prism to hide the whole broswer thing.
So, have you ever done something like this before?
If so, what problems did you encounter?
Finally, is there another solution I haven't thought of?'
(also, Joyent Slingshot was a suggestion on another question, but it's RoR (which I have no experience in) and I'm 99% sure it doesn't run under linux, so It's not right for me.)
I think you should look at tiddlywiki for inspiration.
It's a wiki written in JavaScript entirely self-contained in a single html file. You load it into your browser as a file:/// URL, so there is no need for a server.
I use it as a personal wiki to keep notes on various subjects.
Google Gears is used to offer a few of the google apps offline (Google Reader, Gmail, Docs and more).
What is Google Gears?
Gears is an open source browser extension that lets developers create
web applications that can run offline.
Gears provides three key features:
A local server, to cache and serve application resources (HTML,
JavaScript, images, etc.) without
needing to contact a server
A database, to store and access data from within the browser
A worker thread pool, to make web applications more
responsive
by performing expensive operations in
the background
Gears is currently an early-access developers' release. It is not yet intended for use by real users in production applications at this time.
If you're a developer interested in using Gears with your application, visit the Gears Developer Page.
If you wish to install Gears on your computer, visit the Gears Home Page. Please note, however, that Gears is not yet intended for general use.
But as you read it's still in early stages.
There is an additional option, and that is to use the new HTML5 offline application features, namely the Application Cache, Client-Side Databases, and Local Storage APIs.
Currently I believe that Safari is the only shipping browser to support any of these, and i believe it only supports the client side databases and local storage parts. The webkit nightlies support all of these features, the firefox nightlies support many of them (maybe all now?)
[Edit (olliej): Correction, Firefox 3 supports the Application cache, but alas not the client side DB]
We are using something similar to your third option to test our websites locally. Works just fine.
Our packaged webserver is not small enough to accomplish what you need, but then again we've not been trying to keep it small either. If you can package your webserver code into a small enough package I don't see why this approach would'nt work.
I think AIR is the way to go..
Have you checked into google gears?
Some pointers for solution 3:
for the GUI part, ExtJS seems really nice.
for the storage part, there is a nice javascript library that abstracts different storage backends: PersistJS.
Supported backends for PersistJS:
flash: Flash 8 persistent storage.
gears: Google Gears-based persistent storage.
localstorage: HTML5 draft storage.
whatwg_db: HTML5 draft database storage.
globalstorage: HTML5 draft storage (old spec).
ie: Internet Explorer userdata behaviors.
cookie: Cookie-based persistent storage.
Also, I think the moin moin wiki software has a desktop version that includes its own webserver. This stuff is easy in python, since batteries are included.
You might want to check out how they do it?
You could make a dedicated client using Webkit or Firefox's backbone. Some games use that solution for UI for example.
Or you could make a little webserver (I have a little webserver in Lua that I use for similar purposes, just a few megas with libaries and all). However if you take this route the biggest issue to consider is you don't want your webserver to depend on environmental variables, you want it to be totally autonomous. You should try to isolate all variables t o a config file and be done with it (bundle style)
Or you could use a Java client application to display the webpage
Or GoogleGears, but that's the same (almost) as Flex+Air. so choose Flex+Air if that's what you are familiar with
You didn't specify a language but I looked at Karigell a few years ago. It's Python web framework, similar to Django or TurboGears, but it doesn't have the overhead of those frameworks.
From my messing around with it, it seems like it would work for your purposes. It has a built-in web server (though you can use pretty much any server you want) and you can use any database that Python supports.
Plus, Python works well with Linux. :)
If you made the app a regular web app heavily reliant on client-side technologies (using DHTML and the likes of Google Gears to store data offline as already suggested) so once opened, there wasn't much interaction with the server, you could probably host the thing on a basic shared hosting account which wouldn't cost that much. That might be your easiest starting point as you wouldn't have to worry about all the issues with desktop apps such as compatibility with different operating systems, packaging up an install etc, yet you wouldn't need massive server resources behind it either.
You can use HTML, JS and whatever else in Adobe AIR and you'll have plenty of options of saving data locally, too.
in java world you could use jetty for a server, implement web app using your favorite framework and use hsqldb as a database - it lives entirely in your container (jetty). you can deploy preview app on the web and package downloadable offline version.
There's a portable distribution of Apache/MySQL/PHP (to place on USB keys):
http://portableapps.com/apps/development/xampp
This should be easily adapted to your needs.
You could also consider using XULRunner or Prism
They're the opensource technology that FireFox, Thunderbird and Joost are built on, and allows you to develop apps in XML and javascript essentially against the same rich api that FireFox itself has. And of course this is cross platform too, so it'd work on Mac/Linux/Windows...
Check here for more info:
https://developer.mozilla.org/en/XULRunner
I was thinking of doing something like this myself. My plan was to write app using django and write script that starts django's testing server and opens default browser on specified port. My plan was to use SQLite...
Also, it would be nice to pack it into one package, so users without django installed can run app without any dependecies...
My suggestion, as you pointed above, is to use a Wiki system to solve your problem. Now the question could be: Wich one?
You can use Trac, it is very simple and you can customize its GUI. But, if you prefer something more advanced please use MoinMoin. I used it for years, and IMO it is a very good and strong wiki system.
Depiste wich wiki you will choose, forget to write your web-app from scratch. According to yor question the best approach is to pick something that works and customize/modify it to fit your needs.