I am aware that Django is request/response cycle and Django Channels is different, my question is not about this.
We know that uWSGI/gunicorn creates worker processes and can be configured execute each request in threads. So that it can serve 10 requests "concurrently" (not in parallel) in a single uWSGI worker process with 10 threads.
Now let's assume that each web client wants to create a websocket using Django Channels, from my limited understanding (with vanilla implementation), that it will process each message in a single thread, which means, to process x amount of connections concurrently, you need x amount of channel worker processes. I know someone will suggest to increase the number of processes, I am not here to debate on this.
My question is simply are there any existing libraries that does similar job with uWSGI/gunicorn that execute consumer functions in threads?
I think you are asking for daphne. It is mentioned in channels document itself.
Daphne provides an option to scale process using a shared FD. Unfortunately, it is not working as expected.
Rightnow, a better alternative is to use uvicorn. You can run multiple workers with
$ uvicorn project.asgi --workers 4
I have been using this in production and it seems good enough.
Related
I was reading about CONN_MAX_AGE settings and the documentation says:
Since each thread maintains its own connection, your database must support at least as many simultaneous connections as you have worker threads.
So I wonder, On uWSGI, how does a Django process maintain it's own threads, does it spawn new thread for each request and kill it at the end of request?
If yes, how does a ceased thread maintain the connection?
Django is not in control of any threads (well... maybe in development server, but it's pretty simple), but uWSGI is. uWSGI will spawn some threads, depending on it's configuration and in each thread it will run django request handling.
Spawning threads may be dynamic or static, it can be strictly 4 threads or dynamic from 2 to 12 depending on load.
And no, there is no new thread on each request because that will allow someone to kill your server by making many concurrent connections to it because it will spawn so many threads that no server will take it.
Requests are handled one by one on each thread, main uWSGI process will round-robin requests between threads. If there are more requests than threads, some of them will wait until others are finished
In uWSGI there are also workers - independent processes that can spawn own threads so load can be better spreaded.
Also you can have multiple uWSGI servers and tell your HTTP server (apache, proxy) to spread requests between them. That way you can even serve your uWSGI instances on different machines and it will all look like from the outside as one big server.
I have been using flask, and some of my route handlers start computations that can take several minutes to complete. Using flask's development server, I can use app.run(threaded=True) and my server will continue to respond to other requests while it's off performing these multi-minute computation.
Now I've starting using Flask-SocketIO and I'm not sure how to do the equivalent thing. I understand that I can explicitly spawn a separate thread in python any time it starts one of these computations. Is that the only way to do it? Or is there something equivalent to threaded=True for flask-socketio. (Or, more likely, am I just utterly confused.)
Thanks for any help.
The idea of the threaded mode in Flask/Werkzeug is to enable the development server to handle multiple requests concurrently. In the default mode, the server can handle one request at a time, if a client sends a request while the server is already processing a previous request, then the second request has to wait until that first request is complete. In threaded mode, Werkzeug spawns a thread for each incoming request, so multiple requests are handled concurrently. You obviously are taking advantage of the threaded mode to have requests that take very long to return, while keeping the server responsive to other requests.
Note that this approach is hard to scale properly when you move out of the development web server and into a production web server. For a worker based server you have to pick a fixed number of workers, and that gives you the maximum number of concurrent requests you can have.
The alternative approach is to use a coroutine based server, such as gevent, which is fully supported by Flask. For gevent there is a single worker process, but in it there are multiple lightweight (or "green") threads, that cooperatively allow each other to run. The key to make things work under this model is to ensure that these green threads do not abuse the CPU time they get, because only one can run at a time. When this is done right, the server can scale much better than with the multiple worker approach I described above, and you can easily have hundreds/thousands of clients handled in this fashion.
So now you want to use Flask-SocketIO, and this extension requires the use of gevent. In case the reason for this requirement isn't clear, unlike HTTP requests, SocketIO uses the WebSocket protocol, which requires long-lived connections. Using gevent and green threads makes it possible to have a potentially large number of constantly connected clients, something that would be impossible to do with multiple workers.
The problem is your long calculation, which is not friendly to the gevent type of server. To make it work, you need to ensure your calculation function yields often, so that other threads get a chance to run and don't starve. For example, if your calculation function has a loop in in, you can do something like this:
def my_long_calculation():
while some_condition:
# do some work here
# let other threads run
gevent.sleep()
The sleep() function will basically halt your thread and switch to any other threads that need CPU. Eventually control will be given back to your function, and at that point it'll move on to the next iteration. You need to make sure the sleep calls are not too spaced out (as that will make the rest of the application unresponsive) or not too closer (as that may slow down your calculation).
So to answer your question, as long as you yield properly in your long calculation, you do not need to do anything special to handle concurrent requests, as this is the normal operating mode of gevent.
If for any reason the yield approach is not possible, then you may need to think about offloading the CPU intensive tasks to another process. Maybe use Celery to have these done as a job queue.
Sorry for the long winded answer. Hope this helps!
I'm trying to do a Django application with an asynchronous part: Websockets. Just as a little challenge, I want to mount everything in the same process. Tried Socket.IO but couldn't manage to actually use sockets, instead of longpolling (which killed my browser several times, until I gave up).
What I then tried was a not-so-maintained library based on gevent-websocket. However, had many errors and was not easy to debug.
Now I am trying a Tornado approach but AFAIK (please correct me if I'm wrong) integrating async with a regular django app wrapped by WSGIContainer (websockets would go through Tornado, regular connections through Django) will be a true server killer if a resource is heavy or, somehow, the Django ORM goes slow into heavy operations.
I was thinking on moving to Twisted/Cyclone. Before I move from one architecture with such issue to ANOTHER architecture with such issue, i'd like to ask:
Does Tornado (and/or Twisted) have an architecture of scheduling tasks in the same way Gevent does? (this means: when certain greenlets "block", they schedule themselves to other threads, at least until the operation finishes). I'm asking this because (please correct me if I'm wrong) a regular django view will not be suitable for stuff like #inlineCallbacks, and will cause the whole server to be blocked (incl. the websockets).
I'm new to async programming in python, so there's a huge change I have misinformation about more than one concept. Please help me clarifying this before I switch.
Neither Tornado nor Twisted have anything like gevent's magic to run (some) blocking code with the performance characteristics of asynchronous code. Idiomatic use of either Tornado or Twisted will be visible throughout your app in the form of callbacks and/or Futures/Deferreds.
In general, since you'll need to run multiple python processes anyway due to the GIL, it's usually best to dedicate some processes to websockets with Tornado/Twisted and other processes to Django with the WSGI container of your choice (and then put nginx or haproxy in front so it looks like a single service to the outside world).
If you still want to combine django and an asynchronous service in the same process, the next best solution is to use threads. If you want the two to share one listening port, the listener must be a websocket-aware HTTP server that can spawn other threads for WSGI requests. Tornado does not yet have a solution for this, although one is planned for version 4.1 (https://github.com/tornadoweb/tornado/pull/1075). I believe Twisted's WSGI container does support running the WSGI workers in threads, but I don't have any experience with it myself. If you need them in the same process but do not need to share the same port, then you can simply run the IOLoop or Reactor in one thread and the WSGI container of your choice in another (with its associated worker threads).
I'm writing a web application with Django where users can upload files with statistical data.
The data needs to be processed before it can be properly used (each dataset can take up to a few minutes of time before processing is finished). My idea was to use a python thread for this and offload the data processing into a separate thread.
However, since I'm using uwsgi, I've read about a feature called "Spoolers". The documentation on that is rather short, but I think it might be what I'm looking for. Unfortunately the -Q option for uwsgi requires a directory, which confuses me.
Anyway, what are the best practices to implement something like worker threads which don't block uwsgi's web workers so I can reliably process data in the background while still having access to Django's database/models? Should I use threads instead?
All of the offloading subsystems need some kind of 'queue' to store the 'things to do'.
uWSGI Spooler uses a printer-like approach where each file in the directory is a task. When the task in done the file is removed. Other systems relies on more heavy/advanced servers like rabbitmq and so on.
Finally, do not directly use the low-level api of the spooler but rely on decorators:
http://projects.unbit.it/uwsgi/wiki/Decorators
I am running a Django-based webservice with Gunicorn behind nginx as a reverse proxy.
My webservice provides a Django view which performs calculations using an external instance of MATLAB. Because the MATLAB startup takes some seconds on its own, even requests incurring only very simple MATLAB calculations require this amount of time to be answered.
Moreover, due to the MATLAB sandboxing done in my code, it is important that only one MATLAB instance is run at the same time for a webserver process. (Therefore, currently I am using the Gunicorn sync worker model at the moment which implements a pre-forking webserver but does not utilize any multithreading.)
To improve user experience, I now want to eliminate the waiting time for MATLAB startup by keeping some (e.g. 3-5) "ready" MATLAB instances running and using them as requests come in. After a request has been serviced, the MATLAB process would be terminated and a new one would be started immediately, to be ready for another request.
I have been evaluationg two ways to do this:
Continue using Gunicorn sync worker model and keep one MATLAB instance per webserver process.
The problem with this seems to be that incoming requests are not distributed to the webserver worker processes in a round-robin fashion. Therefore, it could happen that all computationally-intensive requests hit the same process and the users still have to wait because that single MATLAB instance cannot be restarted as fast as necessary.
Outsource the MATLAB computation to a backend server which does the actual work and is queried by the webserver processes via RPC.
In my conception, there would be a number of RPC server processes running, each hosting a running MATLAB process. After a request has been processed, the MATLAB process would be restarted. Because the RPC server processes are queried round-robin, a user would never have to wait for MATLAB to start (except when there are too many requests overall, but that is inevitable).
Because of the issues described with the first approach, I think the RPC server (approach 2) would be the better solution to my problem.
I have already looked at some RPC solutions for Python (especially Pyro and RPyC), however I cannot find an implementation that uses a pre-forking server model for the RPC server. Remember, due to the sandbox, multithreading is not possible and if the server only forks after a connection has been accepted, I would still need to start MATLAB after that which would thwart the whole idea.
Does anybody know a better solution to my problem? Or is the RPC server actually the best solution? But then I would need a pre-forking RPC server (= fork some processes and let them all spin on accept() on the same socket) or at least a RPC framework that can be easily modified (monkey-patched?) to be pre-forking.
Thanks in advance.
I have solved the problem by making my sandbox threadsafe. Now I can use any single-process webserver and use a Queue to get spare MATLAB instances that are spawned in a helper thread.