Django and Celery - How to Distribute? - django

I'm trying to distribute Django and Celery.
I've created a small project with Django and Celery. Django will request a Celery Worker to work on some data on the database. Then the data is passed back to Django.
My idea is that:
Django stack installed on one server
Message queue (RabbitMQ) on one server
Celery worker on one server
Hence 3 Servers in Total
However, the problem is celery has to use some code from Django, for example models, because it accesses the model. Hence, it would also require settings.py file to know what are the servers.
Does this mean that for #3, I would need to install Django and Celery on the server, but disable Django and only run celery? For example celery -A PROJECT_NAME worker -l INFO, but without an Apache Server for Django?

If you want your celery workers to operate on a different server, you need to make sure that all the resources required by the worker are accessible from that server.
For example, if you have a simple task, you can copy only the code required for that task to the server. If your worker needs any other resources like some other code, files, db you need to make sure it has access.
Really, if you want to have two servers working on the same tasks, you will have to use a simple web interface (such as Flask) to communicate between the servers (and extend the functionality of your queue). Then, you will have to ensure they are both using the same data source.
Consider hosting your database remotely, or have the remote server access the database remotely. Either way, any workers running on a server will need access to the database and all source code necessary to complete the task. Then, you must simply have the two servers share a messaging queue.
Source: how to configure and run celery worker on remote system

Related

The scheduler seems to be running under uWSGI, but threads have disabled.You must run uWSGI with the --enable-threads option for the scheduler to work

I'm deploying django app to pythonanywhere where i used APScheduler for automatically send expire mail whenever subscription end date exceed.
I don't know how to enable threads, so that my web app runs perfectly on pythonanywhere.
On hosting platforms like PythonAnywhere, there might be multiple copies of your site running at different times, in order to serve the traffic that you get. So you should not use an in-process scheduler to perform periodic tasks; instead, you should use the platform's built-in scheduled tasks function.

Does Django Channels work as intended as a WSGI app?

I am trying to implement Django Channels because I need to have users receive notifications when another user does something, and I am completely confused by this part:
http://channels.readthedocs.io/en/stable/deploying.html
Deploying applications using channels requires a few more steps than a
normal Django WSGI application, but you have a couple of options as to
how to deploy it and how much of your traffic you wish to route
through the channel layers.
Firstly, remember that it’s an entirely optional part of Django. If
you leave a project with the default settings (no CHANNEL_LAYERS),
it’ll just run and work like a normal WSGI app.
The problem is that I have quite limited rights on the shared hosting that I am using and therefore, I can't use the runworker command.
The quote above says that this part is "optional" and that without it, it'll work like a normal WSGI app. But can I use Django Channels with a normal WSGI app? If not, then doesn't that mean that it's not optional at all?
So my question is: if I skip this part, will the channels still work and will I be able to use the things showed on this page (routing, sending messages, etc): http://channels.readthedocs.io/en/stable/getting-started.html ?
From reading the docs, what i get is, first you need to use a back end to run channel eg. redis, Sharding, and run "runworker", but since it's not an option for you, have a look at this http://channels.readthedocs.io/en/stable/backends.html
"""The in-memory layer is only useful when running the protocol server and the worker server in a single process; the most common case of this is runserver, where a server thread, this channel layer, and worker thread all co-exist inside the same python process."""
So by avoiding third party backend you can use in-memory asgi layer and just run 'runserver' and the channel layer is setup. Just look for in-memory subtopic in the link
And if you keep the CHANNEL_LAYERS empty django'll work as a wsgi app, but what we need is asgi app, and asgi is required for channels.

Remote Django application sending messages to RabbitMQ

I'm starting to get familiar with the RabbitMQ lingo so I'll try my best to explain. I'll be going into a public beta test in a few weeks and this is the set up I am hoping to achieve. I would like Django to be the producer; producing messages to a remote RabbitMQ box and another Celery box listening on the RabbitMQ queue for tasks. So in total there would be three boxes. Django, RabbitMQ & Celery. So far, from the Celery docs, I have successfully been able to run Django and Celery together and Rabbit MQ on another machine. Django simply calls the task in the view:
add.delay(3, 3)
And the message is sent over to RabbitMQ. RabbitMQ sends it back to the same machine that the task was sent from (since Django and celery share the same box) and celery processes the task.
This is great for development purposes. However, having Django and Celery running on the same box isn't a great idea since both will have to compete for memory and CPU. The whole goal here is to get clients in and out of the HTTP Request cycle and have celery workers process the tasks. But the machine will slow down considerably if it is accepting HTTP requests and also processing tasks.
So I was wondering is there was a way to make this all separate from one another. Have Django send the tasks, RabbitMQ forward them, and Celery process them (Producer, Broker, Consumer).
How can I go about doing this? Really simple examples would help!
What you need is to deploy the code of your application on the third machine and execute there only the command that handles the tasks. You need to have the code on that machine also.

Celery and Twitter streaming api with Django

I'm having a really hard time conceptualising how I can connect to the the twitter streaming api and process tweets via an admin interface provided by Django.
The main problem is starting a daemon from Django and having the ability to stop/start it, plus making sure there is provision for monitoring. I don't really want to use upstart for this purpose because I want to try and keep the project as self contained as possible.
I'm currently attempting the following and am unsure if it's perhaps the wrong way to go about things
Start a celery task from Django which establishes a persistent connection to the streaming API
The above task creates subtasks which will process tweets and store them
Because celeryd runs as a daemon it will automatically run the first task again if the connection breaks and the task fails - does this mean I don't need any additional monitoring?
Does the above make sense or have I misunderstood how celery works?

Django and Celery Confusion

After reading a lot of blogposts, I decided to switch from crontab to Celery for my middle-scale Django project. I have a few things I didn't understand:
1- I'm planning to start a micro EC2 instance which will be dedicated to RabbitMQ, would this be sufficient for a small-to-medium heavy tasking? (Such as dispatching periodical e-mails to Amazon SES).
2- Computing of tasks, does compution of tasks occur on the Django server or the rabbitMQ server (assuming the rabbitMQ is on a seperate server)?
3- When I need to grow my system and have 2 or more application servers behind a load balancer, do these two celery machines need to connect to the same rabbitMQ vhost? Assuming application servers are the carbon copy and tasks are same and everything is sync on the database level.
I don't know the answer to this question, but you can definitely configure it to be suitable (e.g. use -c1 for a single process worker to avoid using much memory, or eventlet/gevent pools), see also the --autoscale option. The choice of broker transport also matters here, the ones that are not polling are more CPU effective (rabbitmq/redis/beanstalk).
Computing happens on the workers, the broker is only responsible for accepting, routing and delivering messages (and persisting messages to disk when necessary).
To add additional workers these should indeed connect to the same virtual host. You would
only use separate virtual hosts if you would want applications to have separate message buses.