The documentation states that one should not server static files on the same machine as as the Django project, because static content will kick the Django application out of memory. Does this problem also come from having multiple Django projects on one server ? Should I combine all my Website-Projects into one very large Django project ?
I'm currently serving Django along with php scripts from Apache with mod WSGI. Does this also cause a loss of efficiency ?
Or is the warning just meant for static content, because the problem arises when serving hundreds of files, while serving 20-30 different PHP / Django projects is ok ?
I would say that this setup is completely ok. Off course it depends on the hardware, load and the other projects. But here you can just try and monitor the usage/performance.
The suggestion to use different server(s) for static files makes sense, as it is more efficient for the ressources. But as long as one server performs good enough i don't see a reason to use a second one.
Another question - which has less to do with performance than with ease of use/configuration - is the decision if you really want to run everything on the same server.
For one setup with a bunch of smaller sites (and as well some php-legacy) we use one machine with four virtual servers:
webhead running nginx (and varnish)
database
simple apache2/php server
django server using gunicorn + supervisord
nginx handles all the sites, either proxying to the application-server or serving static content (via nas). I like this setup, as it is very easy to install and handle, as well it makes it simple to scale out one piece if needed. Bu
If the documentation says """one should not server static files on the same machine as as the Django project, because static content will kick the Django application out of memory""" then the documentation is very misleading and arguably plain wrong.
The one suggestion I would make if using PHP on same system is that you ensure you are using mod_wsgi daemon mode for running the Python web application and even one daemon process per Python web application.
Do not run the Python web application in embedded mode because that means you are running stuff in same process as mod_php and because PHP including extensions is not really multithread safe that means you have to be running prefork MPM. Running Python web applications embedded in Apache when running prefork MPM is a bad idea unless you know very well how to set up Apache properly for it. Don't set up Apache right and you get issues like as described in:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
The short of it is that Apache configuration for PHP and Python need to be quite different. You can get around that though by using mod_wsgi daemon mode for the Python web application.
Related
In the Django docs for setting up mod_wsgi, the tutorial notes:
Django doesn’t serve files itself; it leaves that job to whichever Web
server you choose.
We recommend using a separate Web server – i.e., one that’s not also
running Django – for serving media. Here are some good choices:
Nginx
A stripped-down version of Apache
I understand this might be due to wasted resources when Apache spawns new processes to serve each static file, which Nginx avoids. However, Apache's (newish?) Event MPM seems to act similar to an Nginx instance handing off requests to an Apache worker mpm. Therefore I'd like to ask: instead of setting up Nginx to be a reverse proxy for Apache, would using an Apache Event MPM be sufficient for serving static files in Apache?
Apache doesn't spawn a new process for each static file. Apache keeps persistent processes to handle concurrent and subsequent requests just like nginx. The difference is that nginx uses a full async model, whereas Apache relies on processes and/or threading for concurrency, although event MPM uses an async model for initial request acceptance and keep alive connections now. For the majority of people, Apache alone is still a more than acceptable solution. So don't get ahead of yourself if you are just starting out and think you need a Google/Facebook scale solution from the outset.
More important than separate web server is that if using Apache/mod_wsgi, serve the static files under a different host name. That way you avoid heavy weight cookie information being sent for all static file requests. You can do this using virtual hosts in Apache. Also ensure you are using daemon mode of mod_wsgi for running the Django application as that is a better architecture and provides lots more options for setting timeouts so you can have your application recover from various situations which might otherwise cause the server to lock up when overloaded.
For a system which provides a better out of the box configuration and experience than using Apache/mod_wsgi directly and configuring it yourself, look at using mod_wsgi-express.
https://pypi.python.org/pypi/mod_wsgi
http://blog.dscpl.com.au/2015/04/introducing-modwsgi-express.html
http://blog.dscpl.com.au/2015/04/using-modwsgi-express-with-django.html
http://blog.dscpl.com.au/2015/04/integrating-modwsgi-express-as-django.html
The advice about separating the webservers has two advantages. One clearly outlined by Graham. The other is "predictable resource consumption".
The number of resources per HTML page differ. Leaving one webserver to serve the application and the other to serve static resources, has the advantage that you know exactly how many concurrent visitors you can serve: the MaxClients setting of Apache.
If this slows down the loading of images, those webservers need very few modules and no measurable amount of CPU power so a one core machine with SSD disks is all you need and scaling is cheap.
As Graham indicates it starts with a STATIC_URL that has a different hostname. Run it at the same server at the start. When scaling up, tie that hostname to a reverse proxy that serves from several image server backend machines.
Is there good alternatives to the django developement server (runserver) that are more performant,
especially in concurency and static serving, and that have the auto-reload function, without having to setup a full blown production environment ?
Im working on Windows so gunicorn cannot be used.
You can install and use the rungevent commant. It has auto-reload function and it's more performant than thread-based servers (it is greenlet-oriented). The only caveat is the static file serving: you must install a webserver or proxy like nginx for that.
Are you doing so high bulk tests in ur dev server so you suffer this -specially regarding static files-? If so, then you must emulate, as said, a productive environment (just have an nginx correctly configured pointing to the address:port you use for your rungevent command).
If static files is not your problem, install a rungevent command and try how it works.
No since dev sites are made to handle limited requests, runserver runs fine on a machine that can match the requirements of your app.
If you are dealing with a large scale dev project which your system cannot tolerate, then it's either time to reproduce a production environment or upgrade.
I find it difficult to believe that your application is that bad in terms of performance, again if you are trying to test the behavior of a full production site (in terms of DB entries etc) then its time to emulate the production environment.
If that is not the case, then I would start checking the underlying models / code of the project.
Well, if you don't want to use django dev server you will have to spend some time to setup anyway. But the good part is that you can do it only once. Sequential deploying will take very little time.
Not so much time ago I switched from fastcgi to uWSGI and it made my life much easier.
uWSGI is awesome! It has autoreload (which works both in daemon mode and when launched directly in terminal). When launched in terminal you can use debugger (e.g. pdb) during request just like you do in django dev-server. And of course you can debug with print in simple cases.
I'm using it with nginx which serves both static and uWSGI but it of course can be any server.
The most useful feature for me in this configuration is that you use the same thing both for dev and production. For simple projects after developing you just turn off autoreload and a few other options and it's ready.
First of all please let me be clear that I am a windows user and very new to the web world. For the past months I have been learning both python and django, and it has been a great experience for me. Now I have somehow created a small project that I would like to deploy in the production server. Since django has its built-in development server there was no problem for me. But now that I have to deploy it to a production server I googled around and found Nginx + uWSGI or Nginx + Gunicorn as the best option for it. And as uWSGI and Gunicord are incompatible with Windows, I think I should adapt Ubuntu or other Unix system.
So my questions are:
Just to be clear, as I will have to work with one of the above, please explain to me why do I need two servers?
If I have to adapt the Ubuntu environment, do I have to learn Ubuntu shell scripting, SSH and other stuff? Or the hosting provider will help me do that?
Please let me be aware of what else do I need for the above concerned.
Thank you so much for your time and please pardon if my question was a lame question. Hoping for positive response answers.
A typical configuration involves two server processes (which can be run together on the same actual hardware or virtual server) so that the proxy server in front can buffer slow clients. For instance: a slow client will connect to nginx with a request. Nginx will pass the request on to Gunicorn and Gunicorn will respond. Nginx will then consume the Gunicorn response immediately, freeing up the Gunicorn resources right away. At that point, the slow client can take as much time as it wants to consume the response from Nginx without tying up much in the way of server resources. Alternatives to the two-server-process model are to use async workers with Gunicorn and put Gunicorn itself in front, or to use an async-sync combo like Waitress. Nginx in front has the added benefit of doubling as a ready-to-use statics server, though.
Note that "slow clients" can describe: mobile phones that lose their connection and leave the TCP socket hanging until timeout mid-request; mobile phones that are just slow; unreliable connections of all types; hostile denial-of-service clients who are deliberately trying to use server resources; sometimes any old connection that has a hiccup or malfunction for any reason. So this is a problem that will affect nearly any site.
You won't need shell scripting per se but getting used to Ubuntu will take some time. There is a lot to learn even outside of scripting, like how to use the package manager, how to configure packages once they're installed in ways that won't confound future updates, etc. And you will definitely have to learn to use SSH; it is one of the most fundamental server administration tools in the *nix world.
An alternative to learning to use Ubuntu or another server platform is to use a Platform-as-a-Service option like Heroku, as PaaS hosting providers really will take care of all of that stuff for you. I recommend this approach. That having been said, even though I think PaaS is a good option for people who want to focus on development and not server admin regardless of their level of skill, it's also true that a little bit of experience with Linux server platforms goes a long way in helping you to understand the environment that your code runs in. So even if you go with PaaS, you would still benefit from tinkering with Ubuntu a little (or a lot).
Another benefit from a PaaS is that normally their infrastructure handles the Nginx part of the deal (buffering of slow requests via proxy). This is the case with Heroku, for instance. So you won't have to worry about that part of the infrastructure at all.
This part of the question is too broad to answer, but let me know in the comments if you need clarification.
I'm doing it almoast like in this tutorial: http://michal.karzynski.pl/blog/2013/06/09/django-nginx-gunicorn-virtualenv-supervisor/
Nginx is my proxy to django app running on gunicorn and its serving statics, virtualenv for my python enviroment, supervisor to watch my app's running.
It's possible you will run in some error's if not using Postgresql, ask then I will help (used MySQL in the past now it's Postgresql)
Firstly, there's no need to use Ubuntu if you're happier with Windows. I don't know if nginx works on Windows, but I'd be very surprised if it doesn't (in fact, here are the nginx docs for installing on Windows). Apache, meanwhile, definitely does work on Windows. The Django documentation has a full explanation of how to set up Apache/mod_wsgi to serve Django.
You don't need two servers. I'm not sure why you think you do: the usual reason for that is to have the static assets on a separate server, but you don't mention that as a reason. Since you're only talking about a small site, though, you don't even need to do that. One server configured to serve both Django and the static assets will do fine. Again, the docs explain exactly how to do that.
I have been storing some information in global vars in my DJango views. This information can be accessed by every thread in the Python Django process. However, I am wondering about how Django behaves in production. Does a production Django process fork() multiple times to handle requests? If so this data would not be the same across processes. Does anyone know if Django forks?
I'm sure that it depends on your deployment, but if you are running it under FastCGI or WSGI, then yes, it generally pre-forks a number of server processes to handle incoming requests.
I don't know about running under mod_python, but I think that is being discouraged these days in favour of WSGI.
I'm not an expert in this field so I'm answering based only on the grep-ing I've just done.
The fastcgi server seems to be able to fork, depending on configuration settings:
http://code.djangoproject.com/browser/django/tags/releases/1.2.3/django/core/servers/fastcgi.py#L171
http://code.djangoproject.com/browser/django/tags/releases/1.2.3/django/utils/daemonize.py
As for WSGI, I believe that Django side-handling is going straight to the request processing:
http://code.djangoproject.com/browser/django/tags/releases/1.2.3/django/core/handlers/wsgi.py#L217
and forking is configured in mod_wsgi: http://code.google.com/p/modwsgi/ - embedded mode vs daemon mode - and/or in Apache (worker vs prefork builds).
For mod_wsgi, read:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
It explains the various models and guidelines in respect to use of common data across threads/processes. Situation isn't much different for other hosting systems.
I'm developing a Django site. I'm making all my changes on the live server, just because it's easier that way. The problem is, every now and then it seems to like to cache one of the *.py files I'm working on. Sometimes if I hit refresh a lot, it will switch back and forth between an older version of the page, and a newer version.
My set up is more or less like what's described in the Django tutorials: http://docs.djangoproject.com/en/dev/howto/deployment/modwsgi/#howto-deployment-modwsgi
I'm guessing it's doing this because it's firing up multiple instances of of the WSGI handler, and depending on which handler the the http request gets sent to, I may receive different versions of the page. Restarting apache seems to fix the problem, but it's annoying.
I really don't know much about WSGI or "MiddleWare" or any of that request handling stuff. I come from a PHP background, where it all just works :)
Anyway, what's a nice way of resolving this issue? Will running the WSGI handler is "daemon mode" alleviate the problem? If so, how do I get it to run in daemon mode?
Running the process in daemon mode will not help. Here's what's happening:
mod_wsgi is spawning multiple identical processes to handle incoming requests for your Django site. Each of these processes is its own Python Interpreter, and can handle an incoming web request. These processes are persistent (they are not brought up and torn down for each request), so a single process may handle thousands of requests one after the other. mod_wsgi is able to handle multiple web requests simultaneously since there are multiple processes.
Each process's Python interpreter will load your modules (your custom Python files) whenever an "import module" is executed. In the context of django, this will happen when a new view.py is needed due to a web request. Once the module is loaded, it resides in memory, and so any changes you make to the file will not be reflected in that process. As more web requests come in, the process's Python interpreter will simply use the version of the module that is already loaded in memory. You are seeing inconsistencies between refreshes since each web request you are making can be handled by different processes. Some processes may have loaded your Python modules during earlier revisions of your code, while others may have loaded them later (since those processes had not received a web request).
The simple solution: Anytime you modify your code, restart the Apache process. Most times that is as simple as running as root from the shell "/etc/init.d/apache2 restart". I believe a simple reload works as well, which is faster, "/etc/init.d/apache2 reload"
The daemon solution: If you are using mod_wsgi in daemon mode, then all you need to do is touch (unix command) or modify your wsgi script file. To clarify scrompt.com's post, modifications to your Python source code will not result in mod_wsgi reloading your code. Reloading only occurs when the wsgi script file has been modified.
Last point to note: I only spoke about wsgi as using processes for simplicity. wsgi actually uses thread pools inside each process. I did not feel this detail to be relevant to this answer, but you can find out more by reading about mod_wsgi.
Because you're using mod_wsgi in embedded mode, your changes aren't being automatically seen. You're seeing them every once in a while because Apache starts up new handler instances sometimes, which catch the updates.
You can resolve this by using daemon mode, as described here. Specifically, you'll want to add the following directives to your Apache configuration:
WSGIDaemonProcess example.com processes=2 threads=15 display-name=%{GROUP}
WSGIProcessGroup example.com
Read the mod_wsgi documentation rather than relying on the minimal information for mod_wsgi hosting contained on the Django site. In partcular, read:
http://code.google.com/p/modwsgi/wiki/ReloadingSourceCode
This tells you exactly how source code reloading works in mod_wsgi, including a monitor you can use to implement same sort of source code reloading that Django runserver does. Also see which talks about how to apply that to Django.
http://blog.dscpl.com.au/2008/12/using-modwsgi-when-developing-django.html
http://blog.dscpl.com.au/2009/02/source-code-reloading-with-modwsgi-on.html
You can resolve this problem by not editing your code on the live server. Seriously, there's no excuse for it. Develop locally using version control, and if you must, run your server from a live checkout, with a post-commit hook that checks out your latest version and restarts Apache.