Decouple and Dockerize Django and Celery - django

I am wondering what is the best way to decouple Celery from Django in order to dockerize the two parts and use docker swarm service? Typically one starts their celery workers and celery beat using a command that references there Django application:
celery worker -A my_app
celery beat -A my_app
From this I believe celery picks up config info from settings file and a celery.py file which is easy to move to a microservice. What I don't totally understand is how the tasks would leverage the Django ORM? Or is that not really the microservices mantra and Celery should be designed to make GET/POST calls to Django REST Framework API for the data it needs to complete the task?

I use a setup where the code for both the django app and its celery workers is the same (as in a single repository).
When deploying I make sure to have the same code release everywhere, to avoid any surprises with the ORM, etc...
Celery starts with a reference to the django app, so that it has access to the models, etc...
Communication between the workers and the main app happens either through the messaging queue (rabbitmq or redis...) or via the database (as in, the celery worker works directly in the db, since it knows the models, etc...).
I'm not sure if that follows the microservices mantra, but it does work :)

Celery's .send_task or .signature might be helpful:
https://www.distributedpython.com/2018/06/19/call-celery-task-outside-codebase/

Related

Django celery and heroku

I have configured celery to be deployed in heroku, all Is working well, in fact in my logs at heroku celery is ready to handle the taks. Unfortunately celery doesn't pick my tasks I feel there is some disconnection, can I get some help?
If celery isn't picking up task that means that nothing is talking to its broker. Make sure that the task producer is talking to the same broker url as the celery worker (the broker url will appear in the first 10-15 lines of the celery logs).

Can you daemonize Celery through your django site?

Reading the daemonization Celery documentation, if you're running Celery on Linux with systemd, you set it up with two files:
/etc/systemd/system/celery.service
/etc/conf.d/celery
I'm using Celery in a Django site with django-celery-beat, and the documentation is a little confusing on this point:
Example Django configuration
Django users now uses [sic] the exact same template as above, but make sure that the module that defines your Celery app instance also sets a default value for DJANGO_SETTINGS_MODULE as shown in the example Django project in First steps with Django.
The docs don't just come out and say, put your daemonization settings in settings.py and it will all work out, bla, bla. From another SO posts this user seems to have run into the same confusion where Django instructions imply you use init.d method.
Bonus point if you can answer if it's possible to run Celery and RabbitMQ both configured and with the Django instance (if that makes sense).
I'm thinking not Celery if only because daemon variables include CELERYD_ and first steps with django say: "...all Celery configuration options must be specified in uppercase instead of lowercase, and start with CELERY_"

Django-Celery in production?

So I've been trying to figure out how to make scheduled tasks, I've found Celery and been able to to make simple scheduled tasks. To do this I need to open up a command line and run celery -A proj beat for the tasks to happen. This works fine in a development environment, but when putting this into production that will be an issue.
So how can I get celery to work without the command line use? When my production server is online, how can I make sure my scheduler goes up with it? Can Celery do this or do I need to go down another method?
We use Celery in our production environment, which happens to be on Heroku. We are in the process of moving to AWS. In both environments, Celery hums along nicely.
It would be helpful to understand what your production environment will look like. I'm slightly confused as to why you would be worried about turning off your computer, as using Django implies that you are running serving up a website... Are you serving your website from your laptop??
Anyway, assuming that you are going to run your production server from a cloud platform, all you have to do is send whatever command lines you need to run Django AND the command lines for Celery (as you have already noted in your question).
In terms of configuration, you say that you have 'scheduled' tasks, so that implies you have set up a beat schedule in your config.py file. If not, it should look something like this (assumes you have a module called tasks.py which holds your celery task definitions:
from celery.schedules import crontab
beat_schedule = {
'task1': {
'task': 'tasks.task_one',
'schedule': 3600
},
'task2': {
'task': 'tibController.tasks.update_old_retail',
'schedule': crontab(hour=12, minute=0, day_of_week='mon-fri'
}
}
Then in your tasks.py just call the config file you just do this:
from celery import Celery
import config
app = Celery('tasks')
app.config_from_object(config)
You can find more on crontab in the docs. You can also checkout this repo for a simple Celery example.
In summary:
Create a config file that identifies which tasks to run when
Load the config file into your Celery app
Get a cloud platform to run your code on.
Run celery exactly like you have already identified
Hope that helps.

How to process unfinished tasks in django celery

I am working on a project in which zip/gzip files are uploaded by the user and then unzipped and processed using Celery. The website is based on Django.
Now the problem that I am facing is that there are few files that have been uploaded when Celery was not running. Is there anyway that I can re-run celery for such unprocessed files? If so, then how?
Thanks.
You have to track down those tasks manually and start them from Django shell. There are lots of tables that celery creates to keep track of those tasks. It's been a while that I haven't used celery but still my guess would be:
djcelery_crontabschedule
djcelery_taskstate
There might be something about this on celery documentation too: http://docs.celeryproject.org/en/2.3/getting-started/first-steps-with-celery.html
You can also ask your query on celery mailing list.

Does django's runserver option provide a hook for running other restart scripts?

I've recently been playing around with django and celery. One annoying thing during development is the fact that I have to restart the celery daemon each time I modify a task. When I'm developing, I usually like to use 'manage.py runserver' which automatically reloads the django framework on modifications to my apps.
Is there a way to add a hook to the reloading process that runserver does so that it automatically restarts the celery daemon I have running?
Alternatively, does celery have a similar monitor-and-reload-on-change mode that I should be using for development?
Django-supervisor works very well for this purpose. You can have it start the Django server, Celery, and anything else you need, and have different configurations for development and production servers. It also knows to reload the celery daemon when your code changes.
https://github.com/rfk/django-supervisor
I believe you can set CELERY_ALWAYS_EAGER to true.
Yes. Django provides auto reload hook, which can be used to restart other scripts.
Here is a simple management command which prints a message on reload
import subprocess
from django.core.management.base import BaseCommand
from django.utils import autoreload
def reload():
print('Code changed. Auto reloading...')
class Command(BaseCommand):
def handle(self, *args, **options):
autoreload.main(reload)
Now you can save to a reload.py and run it with python manage.py reload. A management command to reload celery workers is available here.
Celery didn't have any feature for reload code or for auto restart when the code change, than you have to restart it manually.
There isn't a way for add an hook, and I think not worthwhile of edit the source code of django just for perform a restart.
Personally while I'm developing i prefere to see the output shell of celery that is decorated with color instead of tail the logs, is more readable.
Celery 2.5 has an experimental runtime option --autoreload that could be used for this purpose, too. Here's more detail in the release notes. That being said, I think django-supervisor (via #Lee Semel) looks like the better way of doing things. I thought I would post this alternative here in case other readers do not want to have to configure another app for asynchronous processing.