Running an async background task on Tornado

Running an async background task on Tornado - python-2.7

I'm using Tornado Async framework for the implementation of a REST Web-Server.
I need to run a high-CPU-load periodic task on the background of the same server.
It is a low-priority periodic task. It should run all the time on all idle cores, but I don't want it to affect the performance of the Web-Server (under a heavy HTTP-request load, it should take lower priority).
Can I do that with the Tornado IOLoop API?
I know I can use tornado.ioloop.PeriodicCallback to call a periodic background task. But if this task is computational-heavy I may cause performance issues to the web service.

Tornado (or asyncio in python 3) or any other single-process-event-loop based solution is not meant to be used for CPU intensive tasks. You should use it only for IO intensive tasks.
The word "background" means, that you're not waiting for a result (I call it sometimes unattended task). More over, if a background task blocks, the rest of a application have to wait, the same way as a request handler blocks, the other parts, including the background, are blocked.
You might be thinking of use of threads, but in python this is not a solution either, due to GIL.
The right solutions are:
decouple worker from the webserver,
use multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor

Related

threading=True with flask-socketio

I have been using flask, and some of my route handlers start computations that can take several minutes to complete. Using flask's development server, I can use app.run(threaded=True) and my server will continue to respond to other requests while it's off performing these multi-minute computation.
Now I've starting using Flask-SocketIO and I'm not sure how to do the equivalent thing. I understand that I can explicitly spawn a separate thread in python any time it starts one of these computations. Is that the only way to do it? Or is there something equivalent to threaded=True for flask-socketio. (Or, more likely, am I just utterly confused.)
Thanks for any help.

The idea of the threaded mode in Flask/Werkzeug is to enable the development server to handle multiple requests concurrently. In the default mode, the server can handle one request at a time, if a client sends a request while the server is already processing a previous request, then the second request has to wait until that first request is complete. In threaded mode, Werkzeug spawns a thread for each incoming request, so multiple requests are handled concurrently. You obviously are taking advantage of the threaded mode to have requests that take very long to return, while keeping the server responsive to other requests.
Note that this approach is hard to scale properly when you move out of the development web server and into a production web server. For a worker based server you have to pick a fixed number of workers, and that gives you the maximum number of concurrent requests you can have.
The alternative approach is to use a coroutine based server, such as gevent, which is fully supported by Flask. For gevent there is a single worker process, but in it there are multiple lightweight (or "green") threads, that cooperatively allow each other to run. The key to make things work under this model is to ensure that these green threads do not abuse the CPU time they get, because only one can run at a time. When this is done right, the server can scale much better than with the multiple worker approach I described above, and you can easily have hundreds/thousands of clients handled in this fashion.
So now you want to use Flask-SocketIO, and this extension requires the use of gevent. In case the reason for this requirement isn't clear, unlike HTTP requests, SocketIO uses the WebSocket protocol, which requires long-lived connections. Using gevent and green threads makes it possible to have a potentially large number of constantly connected clients, something that would be impossible to do with multiple workers.
The problem is your long calculation, which is not friendly to the gevent type of server. To make it work, you need to ensure your calculation function yields often, so that other threads get a chance to run and don't starve. For example, if your calculation function has a loop in in, you can do something like this:
def my_long_calculation():
while some_condition:
# do some work here
# let other threads run
gevent.sleep()
The sleep() function will basically halt your thread and switch to any other threads that need CPU. Eventually control will be given back to your function, and at that point it'll move on to the next iteration. You need to make sure the sleep calls are not too spaced out (as that will make the rest of the application unresponsive) or not too closer (as that may slow down your calculation).
So to answer your question, as long as you yield properly in your long calculation, you do not need to do anything special to handle concurrent requests, as this is the normal operating mode of gevent.
If for any reason the yield approach is not possible, then you may need to think about offloading the CPU intensive tasks to another process. Maybe use Celery to have these done as a job queue.
Sorry for the long winded answer. Hope this helps!

Django and Websockets: How to create, in the same process, an efficient project with both WSGI and websockets?

I'm trying to do a Django application with an asynchronous part: Websockets. Just as a little challenge, I want to mount everything in the same process. Tried Socket.IO but couldn't manage to actually use sockets, instead of longpolling (which killed my browser several times, until I gave up).
What I then tried was a not-so-maintained library based on gevent-websocket. However, had many errors and was not easy to debug.
Now I am trying a Tornado approach but AFAIK (please correct me if I'm wrong) integrating async with a regular django app wrapped by WSGIContainer (websockets would go through Tornado, regular connections through Django) will be a true server killer if a resource is heavy or, somehow, the Django ORM goes slow into heavy operations.
I was thinking on moving to Twisted/Cyclone. Before I move from one architecture with such issue to ANOTHER architecture with such issue, i'd like to ask:
Does Tornado (and/or Twisted) have an architecture of scheduling tasks in the same way Gevent does? (this means: when certain greenlets "block", they schedule themselves to other threads, at least until the operation finishes). I'm asking this because (please correct me if I'm wrong) a regular django view will not be suitable for stuff like #inlineCallbacks, and will cause the whole server to be blocked (incl. the websockets).
I'm new to async programming in python, so there's a huge change I have misinformation about more than one concept. Please help me clarifying this before I switch.

Neither Tornado nor Twisted have anything like gevent's magic to run (some) blocking code with the performance characteristics of asynchronous code. Idiomatic use of either Tornado or Twisted will be visible throughout your app in the form of callbacks and/or Futures/Deferreds.
In general, since you'll need to run multiple python processes anyway due to the GIL, it's usually best to dedicate some processes to websockets with Tornado/Twisted and other processes to Django with the WSGI container of your choice (and then put nginx or haproxy in front so it looks like a single service to the outside world).
If you still want to combine django and an asynchronous service in the same process, the next best solution is to use threads. If you want the two to share one listening port, the listener must be a websocket-aware HTTP server that can spawn other threads for WSGI requests. Tornado does not yet have a solution for this, although one is planned for version 4.1 (https://github.com/tornadoweb/tornado/pull/1075). I believe Twisted's WSGI container does support running the WSGI workers in threads, but I don't have any experience with it myself. If you need them in the same process but do not need to share the same port, then you can simply run the IOLoop or Reactor in one thread and the WSGI container of your choice in another (with its associated worker threads).

Not sure if I should use celery

I have never used celery before and I'm also a django newbie so I'm not sure if I should use celery in my project.
Brief description of my project:
There is an API for sending (via SSH) jobs to scientific computation clusters. The API is an abstraction to the different scientific job queue vendors out there. http://saga-project.github.io/saga-python/
My project is basically about doing a web GUI for this API with django.
So, my concern is that, if I use celery, I would have a queue in the local web server and another one in each of the remote clusters. I'm afraid this might complicate the implementation needlessly.
The API is still in development and some of the features aren't fully finished. There is a function for checking the state of the remote job execution (running, finished, etc.) but the callback support for state changes is not ready. Here is where I think celery might be appropriate. I would have one or several periodic task(s) monitoring the job states.
Any advice on how to proceed please? No celery at all? celery for everything? celery just for the job states?

I use celery for similar purpose and it works well. Basically I have one node running celery workers that manage the entire cluster. These workers generate input data for the cluster nodes, assign tasks, process the results for reporting or generating dependent tasks.
Each cluster node is running a very small python server which takes a db id of it's assigned job. It then calls into the main (http) server to request the data it needs and finally posts the data back when complete. In my case, the individual nodes don't need to message each other and run time of each task is very long (hours). This makes the delays introduced by central management and polling insignificant.
It would be possible to run a celery worker on each node taking tasks directly from the message queue. That approach is appealing. However, I have complex dependencies that are easier to work out from a centralized control. Also, I sometimes need to segment the cluster and centralized control makes this possible to do on the fly.
Celery isn't good at managing priorities or recovering lost tasks (more reasons for central control).
Thanks for calling my attention to SAGA. I'm looking at it now to see if it's useful to me.

Celery is useful for execution of tasks which are too expensive to be executed in the handler of HTTP request (i.e. Django view). Consider making an HTTP request from Django view to some remote web server and think about latencies, possible timeouts, time for data transfer, etc. It also makes sense to queue computation intensive tasks taking much time for background execution with Celery.
We can only guess what web GUI for API should do. However Celery fits very well for queuing requests to scientific computation clusters. It also allows to track the state of background task and their results.
I do not understand your concern about having many queues on different servers. You can have Django, Celery broker (implementing queues for tasks) and worker processes (consuming queues and executing Celery tasks) all on the same server.

cpu load and django application that makes long-response-time requests to external API

I'm developing a web application in python for which each user request makes an API call to an external service and takes about 20 seconds to receive response. As a result, in the event of several concurrent requests being made, the CPU load goes crazy (>95%) with several idle processes.
The server consists of a 1.6 GHz dual core Atom 330 with 2GB RAM.
The web app is developed in python which is served through Apache with mod_wsgi
My question is the following. Will a non-blocking webserver such as Tornado improve CPU load and thus handle more concurrent users (I'm also interested why) ? Can you suggest any other scalable solution ?

This really doesn't have anything to do with blocking; it does, but it doesn't. The 20 sec request is blocking one thread, so another has be utilized for the next request. Whereas with quick requests, the threads basically round-robin.
However, this really shouldn't be spiking your CPU output. Webservers have an upward limit of "workers" that get spawned and when they're all tied up, they're all tied up. It won't extend past the limit, so unless you've set or the default setting is higher than the box you have is capable of running, it shouldn't push your CPU that high.
Regardless, all that is merely informational, and doesn't really solve your problem. With such a long running request though, you should be offloading this from your webserver as quick as possible. The webserver should merely hand off the request to another process that can asyncronously handle it and then employ polling to notify the client when the response is ready. Node.js is used a lot in similar scenarios, but I really don't have enough experience with it to give you any real guidance beyond that.

You should look into using message queues to offload tasks so that your user requests are not blocked.
You could look into python libs kombu and celery to handle messages and tasks.

You are likely using prefork MPM with Apache and mod_wsgi embedded mode. This is a bad combination by default because Apache is setup for PHP and not fat Python web applications. Read:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
which explains this exact sort of issue.
Use mod_wsgi daemon mode at the minimum, and preferably also change to worker MPM for Apache.

System architecture: simple approach for setting up background tasks behind a web application -- will it work?

I have a Django web application and I have some tasks that should operate (or actually: be initiated) on the background.
The application is deployed as follows:
apache2-mpm-worker;
mod_wsgi in daemon mode (1 process, 15 threads).
The background tasks have the following characteristics:
they need to operate in a regular interval (every 5 minutes or so);
they require the application context (i.e. the application packages need to be available in memory);
they do not need any input other than database access, in order to perform some not-so-heavy tasks such as sending out e-mail and updating the state of the database.
Now I was thinking that the most simple approach to this problem would be simply to piggyback on the existing application process (as spawned by mod_wsgi). By implementing the task as part of the application and providing an HTTP interface for it, I would prevent the overhead of another process that is holding all of the application into memory. A simple cronjob can be setup that sends a request to this HTTP interface every 5 minutes and that would be it. Since the application process provides 15 threads and the tasks are quite lightweight and only running every 5 minutes, I figure they would not be hindering the performance of the web application's user-facing operations.
Yet... I have done some online research and I have seen nobody advocating this approach. Many articles suggest a significantly more complex approach based on a full-blown messaging component (such as Celery, which uses RabbitMQ). Although that's sexy, it sounds like overkill to me. Some articles suggest setting up a cronjob that executes a script which performs the tasks. But that doesn't feel very attractive either, as it results in creating a new process that loads the entire application into memory, performs some tiny task, and destroys the process again. And this is repeated every 5 minutes. Does not sound like an elegant solution.
So, I'm looking for some feedback on my suggested approach as described in the paragraph before the preceeding paragraph. Is my reasoning correct? Am I overlooking (potential) problems? What about my assumption that application's performance will not be impeded?

All are reasonable approaches depending on your specific requirements.
Another is to fire up a background thread within the process when the WSGI script is loaded. This background thread could simply sleep and wake up occasionally to perform required work and then go back to sleep.
This method necessitates though that you have at most one Django process which the background thread runs in to avoid different processing doing the same work on any database etc.
Using daemon mode with a single process as you are would satisfy that criteria. There are potentially other ways you could achieve that though even in a multiprocess configuration.

Note that celery works without RabbitMQ as well. It can use a ghetto queue (SQLite, MySQL, Postgres, etc, and Redis, MongoDB), which is useful in testing or for simple setups where RabbitMQ seems overkill.
See http://ask.github.com/celery/tutorials/otherqueues.html
(Using Celery with Redis/Database as the messaging queue.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js