What is the best deployment configuration for Django? - django

I will be deploying my django project on the server. For that purpose I plan on doing the following optimization.
What i would like to know is that am I missing something?
How can I do it in a better manner?
Front-end:
Django-static (For compressing static media)
Running jquery from CDN
Cache control for headers
Indexing the Django db (For certain models)
Server side:
uswgi and nginx .
Memcached (For certain queries)
Putting the media and database on separate servers

These are some optimization I use on a regular basis:
frontend:
Use a js loading library like labjs, requirejs or yepnope. You should still compress/merge your js files, but in most use cases it seems to be better to make several requests to several js files and run them in parallel as to have 1 huge js file to run on each page. I always split them up in groups that make sense to balance requests and parellel loading. Some also allow for conditional loading and failovers (i.e. if for some reason, your cdn'd jquery is not there anymore)
Use sprites where posible.
Backend:
configure django-compressor (django-static is fine)
Enable gzip compression in nginx.
If you are using postgresql (which is the recommended sql database), use something like pgbouncer or pgpool2.
Use and configure cache (I use redis)
(already mentioned - use celery for everything that might take longer)
Small database work: use indexes where it's needed, look out for making too many queries (common when not using select_related where you are supposed to) or slow queries (enable log slow queries in your db). Always use select_related with arguments.
If implementing search, I always use a standalone search engine. (elasticsearch/solr)
Now comes profiling the app and looking for code specific improvements. Some things to keep an eye on.

An option may be installing Celery if you need to support asynchronous & period tasks. If you do so, consider installing Redis instead of Memcached. Using Redis you can manage sessions and carry out Celery operations as well as do caching.
Take a look at here: http://unfoldthat.com/2011/09/14/try-redis-instead.html

Related

Why we need to setup AWS and POSTgres db when we deploy our app using Heroku?

I'm building a web api by watching the youtube video below and until the AWS S3 bucket setup I understand everything fine. But he first deploy everything locally then after making sure everything works he is transferring all static files to AWS and for DB he switches from SQLdb3 to POSgres.
django portfolio
I still don't understand this part why we need to put our static files to AWS and create POSTgresql database even there is an SQLdb3 default database from django. I'm thinking that if I'm the only admin and just connecting my GitHub from Heroku should be enough and anytime I change something in the api just need to push those changes to github master and that should be it.
Why we need to use AWS to setup static file location and setup a rds (relational data base) and do the things from the beginning. Still not getting it!
Can anybody help to explain this ?
Thanks
Databases
There are several reasons a video guide would encourage you to switch from SQLite to a database server such as MySQL or PostgreSQL:
SQLite is great but doesn't scale well if you're expecting a lot of traffic
SQLite doesn't work if you want to distribute your app accross multiple servers. Going back to Heroky, if you serve your app with multiple Dynos, you'll have a problem because each Dyno will use a distinct SQLite database. If you edit something through the admin, it will happen on one of this databases, at random, leading to inconsistencies
Some Django features aren't available on SQLite
SQLite is the default database in Django because it works out of the box, and is extremely fast and easy to use in local/development environments for prototyping.
However, it is usually not suited for production websites. Additionally, while it can be tempting to store your sqlite.db file along with your code, for instance in a git repository, it is considered a bad practice because your database can contain sensitive data (such as passwords, usernames, emails, etc.). Hence, a strict separation between your code and data is a good practice.
Another way to put it is that your code and your data have different lifecycles. You want to be able to edit data in your database without redeploying your code, and update your code without touching your database.
Even if you can remove public access to some files through GitHub, this is not a good practice because when you work in a team with multiple developpers, developpers may have access to the code but not the production data, because it's usually sensitive. If you work with 5 people and each one of them has a copy of your database, it means the risk to lose it or have it stolen is 5x higher ;)
Static files
When you work locally, Django's built-in runserver command handles the serving of static assets such as CSS, Javascript and images for you.
However, this server is not designed for production use either. It works great in development, but will start to fail very fast on a production website, that should handle way more requests than your local version.
Because of that, you need to host these static files somewhere else, and AWS is one place where you can do that. AWS will serve those files for you, in a very efficient way. There are other options available, for instance configuring a reverse proxy with Nginx to serve the files for you, if you're using a dedicated server.
As far as I can tell, the progression you describe from the video is bringing you from a local, development enviromnent to a more efficient and scalable production setup. That is to be expected, because it's less daunting to start with something really simple (SQLite, Django's built-in runserver), and move on to more complex and abstract topics and tools later on.

Why serving static file in production, using django it's discouraged?

I have developed a web application that uses (obviously) some static files, in order to deploy it, I've chosen to serve the files with the WSGI interpreter and use for the job gunicorn behind a firewall and a reverse proxy.
My application uses whitenoise to server staticfiles: Everything works fine and I don't have any issue regarding the performances...but, really, I can't understand WHY the practice to serve those static files using directly the WSGI interpreter it's discouraged (LINK), says:
This is not suitable for production use! For some common deployment strategies...
I mean, my service it's a collection of microservices: DB-Frontend-Services-Etc...If I need to scale them, I can do this without any problem and, in addition, using this philosophy, I'm not worried about the footprint of my microservices: for me, this seems logical, but maybe, for the rest of the world this is a completely out-of-mind strategy.
You've misinterpreted that documentation. It's fine to use Whitenoise to serve static files; that is entirely what it's for. What is not a good idea is to use that internal Django function to do so, since it is inefficient.
Three reasons why I personally serve static from CDN,
1- You are using up bandwidth from your app server and loosing time getting these static files instead of throwing the load to CDN to handle all that. (WhiteNoise should though eliminate that)
2- Some hosting services like AWS will charge you for extra traffic in/out, while you can use cheaper services like Cloudfront and a S3 bucket.
3- I like to keep my app servers for app purposes only, and utilize each service for its job only, this helps me in debugging and reducing my failure points.
On the other hand though, serving static from app server with something like WhiteNoise is much much easier than configuring your CDN.
Hope this helps!
It's quite ok when you use Whitenoise because:
Whitenoise is exactly made for this purpose and therefore efficient
It'll set the HTTP response headers correctly so clients cache the files.
But think of it this way: Instead of serving 1 or 2 requests per web page, you'll often get 10x more requests (usually web pages will request a bunch of images, one or more css files, a couple of js files...). Meaning you have to scale your application server to serve 10x more traffic on average than if you leave the job to a CDN.
By the way, I've written a tutorial on this topic which may help.

Recommended way to setup Django Fast CGI configuration with multiple domains

I'm creating a Django project that will be used by multiple domains, and the functionality will be slightly different depending on the domain. I'm looking for advice on the proper way to set this up.
The
sites framework seems like it would be a good fit for doing some of the customizations once processing has reached the point where it's executing the Django code. But I'm trying to determine what the setup should be before we reach that point (relating to the nginx, flup, fastcgi, config).
Here is my current understanding:
It seems like multiple Django settings files are appropriate, each with a different SITE_ID. Then two virtual hosts would be setup in the nginx configuration that would point to two different sockets. Two 'manage.py runfastcgi' processes would then be used to listen on those two different sockets and each process would reference a different settings.py
./manage.py --settings=settings.site1.py runfcgi method=prefork socket=/home/user/mysite1.sock pidfile=django1.pid
./manage.py --settings=settings.site2.py runfcgi method=prefork socket=/home/user/mysite2.sock pidfile=django2.pid
However, it seems like this could get messy if you add more domains. It would require a new 'manage runfastcgi' process to be run for every domain that would be added. Is there a way to support multiple sites in this way without running a separate process for each?
What are your experiences with hosting multiple domains with Django?
Any advice is much appreciated. Thank you for reading.
Joe
If you are going to have a lot of domains running, one process per domain might get quite expensive. The sites framework was originally made with another use case in mind: being able to easily create "duplicate" content on several news sites. When trying to use the sites framework for other uses you run into several difficulties.
one possibility is to move the domain processing to a middleware and have django handle the multidomain part. It's not trivial though, specially if you have to tweak apps to be domain aware, and also urlconfs, templates, etc... A quick google search showed up:
http://djangosnippets.org/snippets/1119/
Might help as a starting point.

which django cache backend to use?

I been using cmemcache + memcache for a while with positive results.
However cmemcache lately not tagging along well and I also found that it's now recommended. I have now installed python-memcached and its working well. As I have decided to change would like to try some other cache back end any recommendation.
I have also came across pylibmc and python-libmemcached any other??
Have anyone tried nginx memcache module?
Thanks
Only cmemcache and python-memcached are supported by the Django memcached backend. I don't have experience with either of the two libraries you mentioned, but if you want to use them you will need to write a new cache backend.
The nginx memcached module is something completely different. It doesn't really seem like it would work well with Django's caching, but it might with enough hacking. At any rate, I wouldn't use it, since it is much better if you control what gets cached and retrieved from cache from the Python side.

Django cluster deployment

I have five nodes behind a load balancer and I'm trying to determine the optimal configuration for a Django based site.
Each node has access to Postgres, mod_wsgi, Apache, Lighttpd, memcached, pgpool2 (for database replication) and glusterfs(for media file replication) and is running Ubuntu 8.04LTS.
So far, the setup is four nodes running Apache/Lighttpd/memcached/pgpool2 all reading/writing to one master node that is running the "master" Postgresql. Each of the four web nodes is also running Postgres for replication from the master via pgpool.
So, my question is: How would you configure this setup and/or what would you change so that there is no single point of failure, if possible?
This sounds like a good setup, although its hard to know exactly what your setup looks like. In terms of memory etc. and what traffic you expect to handle.
You might want to consider using Django's multidb support and have a read only postgres instance (use DB routing to direct reads to the read only for certain apps). This can offer up some quite nice speed improvements - and at the moment you could have a potential bottleneck at the single postgres instance depending how heavy your database work is.
As #ashwoods suggested, it might be working looking into gunicorn and nginx. I guess at the moment you use Apache only to run mod_wsgi? And lighttpd for the static files? With nginx, you can use it with a number of wsgi servers and its great for static files too.
The setup looks pretty good to me. I would consider using gunicorn/uwsgi + nginx. I would also benchmark using pbbouncer, although pgpool2 offers more out of the box.