Are Django settings shared across uwsgi workers? - django

I have a Django app, with a setting (in my settings.py file) that's populated dynamically in my App Config's ready() function. Ie in settings.py I have:
POPULATE_THIS = None
and then in apps.py in ready I have:
def ready(self):
if POPULATE_THIS is None:
POPULATE_THIS = ... some code which instantiates an object I need that's effectively a singleton ...
This seems to work ok. But I'm now at the point where rather than just running the dev server locally (ie python manage.py runserver), I'm now running the Django app through uwsgi (proxied behind nginx), and uwsgi is configured to run 10 worker processes (ie my uwsgi ini file has processes = 10 and threads = 1).
I'm seeing evidence that even though there are 10 uwsgi processes , ready() is still called exactly once on app startup and the value of POPULATE_THIS is the same across all workers (doing a str on it is giving the same memory address).
My question: How is that value shared across the uwsgi processes, as I thought separate processes are distinct and do not share any memory? And am I correct in assuming that ready() is going to be called once per app startup (ie when uwsgi itself spins up), and not once per uwsgi worker process startup?
This answer (Multiple server processes using nginx and uWSGI) on a different question seems to indicate that some data is shared across workers, but I can't seem to find any official docs that indicate what exactly is shared and how, specifically with respect to Django settings, so some explanation/details would be much appreciated.

Exactly.
It seems that uwsgi only spawn processes of the django application itself, therefore all the functions ready will be called only once, during the first run.

Related

What is the lifetime of a Django WSGI application object when served using Gunicorn or uWSGI?

I'm using Django with Gunicorn. Since I don't know much about the Gunicorn internals, so I'm wondering what is the lifetime of an WSGI application object. Is it forever, or created for every request, or lives as long as the worker lives?
Similarly in uWSGI seems to invoke an application callable for one request as per this. So, does this mean that application objects lives for a while (and uWSGI invokes the same object for every request)?
This might be stupid but I'm trying to figure this out while trying to cache some stuff (let's say in some global or file level variables) at the application level to avoid cache (Redis/Memcached) or db calls. I'm wondering if application object lives for at least some time, then may be a good thing to cache data at regular intervals without making cache requests (after all it's a N/W request) as well.
Please help me understand this.
You seem to be missing some important distinctions -- WSGI is the name of the protocol as dictated by PEP-333(3), and gunicorn/uwsgi are implementations of the said protocol.
what is the lifetime of an WSGI application object. Is it forever, or created for every request, or lives as long as the worker lives?
Django has wsgi.py file that exposes the WSGI application object named application, and all USGI servers use that (you need to pass the location of this callable). The basic criterion is that the application object needs to take two arguments:
the WSGI environment
a callable to start the response
The application wraps everything for the layers below by generating request and other necessary metadata from the environment, and while it's time to send the response it calls the callable provided with the status code and headers. Then the response body goes as an iterable.
So as you can see, the WSGI servers can get the application object while starting up and can call that each time a request comes to get response, throughout the lifetime of the server process. I've used uwsgi mostly, so I can tell uwsgi (and presumably others) does exactly that.
To give you more context on this, uwsgi has the concept of a master process starting worker processes and spawning worker threads inside each of the processes. gunicorn (and others) presumably has a similar concept if not exactly the same.

Does Django run in single thread by default?

By reading the code, I found it seems that Django run in single thread by default.
However, when I use sleep(15) in my view function and open two web to request my function. They return the response almost at the same time!
so, I do not know why does it happened……
my Django version is 1.9
Django itself does not determine whether it runs in one or more threads. This is the job of the server running Django.
The development server used to be single-threaded, but in recent versions it has been made multithreaded. Other servers such as Apache/mod_wsgi, gunicorn, or uwsgi, have their own defaults and can be configured in a number of ways; often they use multiple processes rather than threads.

uWSGI + nginx for django app avoids pylibmc multi-thread concurrency issue?

Introduction
I encountered this very interesting issue this week, better start with some facts:
pylibmc is not thread safe, when used as django memcached backend, starting multiple django instance directly in shell would crash when hit with concurrent requests.
if deploy with nginx + uWSGI, this problem with pylibmc magically dispear.
if you switch django cache backend to python-memcached, it too will solve this problem, but this question isn't about that.
Elaboration
start with the first fact, this is how I reproduced the pylibmc issue:
The failure of pylibmc
I have a django app which does a lot of memcached reading and writing, and there's this deployment strategy, that I start multiple django process in shell, binding to different ports (8001, 8002), and use nginx to do the balance.
I initiated two separate load test against these two django instance, using locust, and this is what happens:
In the above screenshot they both crashed and reported exactly the same issue, something like this:
Assertion "ptr->query_id == query_id +1" failed for function "memcached_get_by_key" likely for "Programmer error, the query_id was not incremented.", at libmemcached/get.cc:107
uWSGI to the rescue
So in the above case, we learned that multi-thread concurrent request towards memcached via pylibmc could cause issue, this somehow doesn't bother uWSGI with multiple worker process.
To prove that, I start uWSGI with the following settings included:
master = true
processes = 2
This tells uWSGI to start two worker process, I then tells nginx to server any django static files, and route non-static requests to uWSGI, to see what happens. With the server started, I launch the same locust test against django in localhost, and make sure there's enough requests per seconds to cause concurrent request against memcached, here's the result:
In the uWSGI console, there's no sign of dead worker processes, and no worker has been re-spawn, but looking at the upper part of the screenshot, there sure has been concurrent requests (5.6 req/s).
The question
I'm extremely curious about how uWSGI make this go away, and I couldn't learn that on their documentation, to recap, the question is:
How did uWSGI manage worker process, so that multi-thread memcached requests didn't cause django to crash?
In fact I'm not even sure that it's the way uWSGI manages worker processes that avoid this issue, or some other magic that comes with uWSGI that's doing the trick, I've seen something called a memcached router in their documentation that I didn't quite understand, does that relate?
Isn't it because you actually have two separate processes managed by uWSGI? As you are setting the processes option instead of the workers option, so you should actually have multiple uWSGI processes (I'm assuming a master + two workers because of the config you used). Each of those processes will have it's own loaded pylibmc, so there is not state sharing between threads (you haven't configured threads on uWSGI after all).

how to run Apache with mod_wsgi and django in one process only?

I'm running apache with django and mod_wsgi enabled in 2 different processes.
I read that the second process is a on-change listener for reloading code on change, but for some reason the ready() function of my AppConfig class is being executed twice. This function should only run once.
I understood that running django runserver with the --noreload flag will resolve the problem on development mode, but I cannot find a solution for this in production mode on my apache webserver.
I have two questions:
How can I run with only one process in production or at least make only one process run the ready() function ?
Is there a way to make the ready() function run not in a lazy mode? By this, I mean execute only on on server startup, not on first request.
For further explanation, I am experiencing a scenario as follows:
The ready() function creates a folder listener such as pyinotify. That listener will listen on a folder on my server and enqueue a task on any changes.
I am seeing this listener executed twice on any changes to a single file in the monitored directory. This leads me to believe that both processes are running my listener.
No, the second process is not an onchange listener - I don't know where you read that. That happens with the dev server, not with mod_wsgi.
You should not try to prevent Apache from serving multiple processes. If you do, the speed of your site will be massively reduced: it will only be able to serve a single request at a time, with others queued until the first finishes. That's no good for anything other than a toy site.
Instead, you should fix your AppConfig. Rather than blindly spawning a listener, you should check to see if it has already been created before starting a new one.
You shouldn't prevent spawning multiple processes, because it's good thing, especially on production environment. You should consider using some external tool, separated from django or add check if folder listening is already running (for example monitor persistence of PID file and it's content).

Django and Celery - How to Distribute?

I'm trying to distribute Django and Celery.
I've created a small project with Django and Celery. Django will request a Celery Worker to work on some data on the database. Then the data is passed back to Django.
My idea is that:
Django stack installed on one server
Message queue (RabbitMQ) on one server
Celery worker on one server
Hence 3 Servers in Total
However, the problem is celery has to use some code from Django, for example models, because it accesses the model. Hence, it would also require settings.py file to know what are the servers.
Does this mean that for #3, I would need to install Django and Celery on the server, but disable Django and only run celery? For example celery -A PROJECT_NAME worker -l INFO, but without an Apache Server for Django?
If you want your celery workers to operate on a different server, you need to make sure that all the resources required by the worker are accessible from that server.
For example, if you have a simple task, you can copy only the code required for that task to the server. If your worker needs any other resources like some other code, files, db you need to make sure it has access.
Really, if you want to have two servers working on the same tasks, you will have to use a simple web interface (such as Flask) to communicate between the servers (and extend the functionality of your queue). Then, you will have to ensure they are both using the same data source.
Consider hosting your database remotely, or have the remote server access the database remotely. Either way, any workers running on a server will need access to the database and all source code necessary to complete the task. Then, you must simply have the two servers share a messaging queue.
Source: how to configure and run celery worker on remote system