How to configure multiple brokers in Django and Celery? - django

Requirement: Django making use of RabbitMQ(Internal) and SQS/Kafka
Both tasks shares common DB/Django models.
Django settings supports only one broker configuration as of Oct 2016
How to have shared tasks with different Queue configurations and broker settings ?

I used the code below to specify multiple brokers. If I remember correctly, Celery tries them in the specified order and uses the first one working.
Note: The IPs and hostnames used in the example below are for illustration purposes and need to be adapted to your environment.
from __future__ import absolute_import
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'backend.settings')
app = Celery('proj', broker=["redis://redis:6379/0", "redis://192.168.99.100:6379/0", "redis://192.168.99.102:6379/0"])
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()

Related

Django with Celery on Digital Ocean

The Objective
I am trying to use Celery in combination with Django; The objective is to set up Celery on a Django web application (deployed test environment) to send scheduled emails. The web application already sends emails. The ultimate objective is to add functionality to send out emails at a user-selected date-time. However, before we get there the first step is to invoke the delay() function to prove that Celery is working.
Tutorials and Documentation Used
I am new to Celery and have been learning through the following resources:
First Steps With Celery-Django documentation: https://docs.celeryq.dev/en/stable/django/first-steps-with-django.html#using-celery-with-django
A YouTube video on sending email from Django through Celery via a Redis broker: https://www.youtube.com/watch?v=b-6mEAr1m-A
The Redis/Celery droplet was configured per the following tutorial https://www.digitalocean.com/community/tutorials/how-to-install-and-secure-redis-on-ubuntu-20-04
I have spent several days reviewing existing Stack Overflow questions on Django/Celery, and tried a number of suggestions. However, I have not found a question specifically describing this effect in the Django/Celery/Redis/Digital Ocean context. Below is described the current situation.
What Is Currently Happening?
The current outcome, as of this post, is that the web application times out, suggesting that the Django app is not successfully connecting with the Celery to send the email. Please note that towards the bottom of the post is the output of the Celery worker being successfully started manually from within the Django app's console, including a listing of the expected tasks.
The Stack In Use
Python 3.11 and Django 4.1.6: Running on the Digital Ocean App platform
Celery 5.2.7 and Redis 4.4.2 on Ubuntu 20.04: Running on a separate Digital Ocean Droplet
The Django project name is, "Whurthy".
Celery Setup Code Snippets
The following snippets are primarily from the Celery-Django documentation: https://docs.celeryq.dev/en/stable/django/first-steps-with-django.html#using-celery-with-django
Whurthy/celery.py
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'Whurthy.settings')
app = Celery('Whurthy')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print(f'Request: {self.request!r}')
Whurthy/__init__.py
from .celery import app as celery_app
__all__ = ('celery_app',)
Application Specific Code Snippets
Whurthy/settings.py
CELERY_BROKER_URL = 'redis://SNIP_FOR_PRIVACY:6379'
CELERY_RESULT_BACKEND = 'redis://SNIP_FOR_PRIVACY:6379'
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = TIME_ZONE
I have replaced the actual IP with the string SNIP_FOR_PRIVACY for obvious reasons. However, if this were incorrect I would not get the output below.
I have also commented out the bind and requirepass redis configuration settings to support troubleshooting during development. This makes the URL as simple as possible and rules out either the incoming IP or password as being the cause of this problem.
'events/tasks.py`
from celery import shared_task
from django.core.mail import send_mail
#shared_task
def send_email_task():
send_mail(
'Celery Task Worked!',
'This is proof the task worked!',
'notifications#domain.com',
['my_email#domain.com'],
)
return
For privacy reasons I have changed the to and from email addresses. However, please note that this function works before adding .delay() to the following snippet. In other words, the Django app sends an email up until I add .delay() to invoke Celery.
events/views.py (extract)
from .tasks import send_email_task
from django.shortcuts import render
def home(request):
send_email_task.delay()
return render(request, 'home.html', context)
The above is just the relevant extract of a larger file to show the specific line of code calling the function. The Django web application is working until delay() is appended to the function call, and so I have not included other Django project file snippets.
Output from Running celery -A Whurthy worker -l info in the Digital Ocean Django App Console
Ultimately, I want to Dockerize this command, but for now I am running the above command manually. Below is the output within the Django App console, and it appears consistent with the tutorial and other examples of what a successfully configured Celery instance would look like.
<SNIP>
-------------- celery#whurthy-staging-b8bb94b5-xp62x v5.2.7 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-4.4.0-x86_64-with-glibc2.31 2023-02-05 11:51:24
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: Whurthy:0x7f92e54191b0
- ** ---------- .> transport: redis://SNIP_FOR_PRIVACY:6379//
- ** ---------- .> results: redis://SNIP_FOR_PRIVACY:6379/
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. Whurthy.celery.debug_task
. events.tasks.send_email_task
This appears to confirm that the Digital Ocean droplet is starting up a Celery worker successfully (suggesting that the code snippets above are correct) and that Redis configuration is correct. The two tasks listed when starting Celery is consistent with expectations. However, I am clearly missing something, and cannot rule out that the way Digital Ocean runs droplets is getting in the way.
The baseline test is that the web application sends out an email through the function call. However, as soon as I add .delay() the web page request times out.
I have endeavoured to replicate all that is relevant. I welcome any suggestions to resolve this issue or constructive criticism to improve this question.
Troubleshooting Attempts
Attempt 1
Through the D.O. app console I ran python manage.py shell
I then entered the following into the shell:
>>> from events.tasks import send_email_task
>>> send_email_task
<#task: events.tasks.send_email_task of Whurthy at 0x7fb2f2348dc0>
>>> send_email_task.delay()
At this point the shell hangs/does not respond until I keyboard interrupt.
I then tried the following:
>>> send_email_task.apply()
<EagerResult: 90b7d92c-4f01-423b-a16f-f7a7c75a545c>
AND, the task sends an email!
So, the connection between Django-Redis-Celery appears to work. However, invoking delay() causes the web app to time out and the email to NOT be sent.
So either delay() isn't putting the task in the queue, or is getting stuck. But in either case, this does not appear to be a connection issue. However, because apply() runs the code in the thread of the caller this isn't resolving the issue.
Which does suggest this may be an issue with the broker. This in turn may be an issue with settings...
Made minor changes to broker settings in settings.py
CELERY_BROKER_URL = 'redis://SNIP_FOR_PRIVACY:6379/0'
CELERY_RESULT_BACKEND = 'redis://SNIP_FOR_PRIVACY:6379/1'
delay() still hangs in the shell.
Attempt 2
I discovered that in Digital Ocean the ipv4 does not work when used for the Broker URL. By replacing that with the private IP in the CELERY_BROKER_URL setting I was able to get delay() working within the Django app's shell.
However, while I can now get delay() working in the shell returning to the original objective still fails. In other words, when loading in the respective view the web application hangs.
I am currently researching other approaches. Any suggestions are welcome. Given that I can now get Celery to work through the broker in the shell but not in the web application I feel like I have made some progress but am still out of a solution.
As a side note, I am also trying to make this connection through a Digital Ocean Managed Redis DB, although that is presenting a completely different issue.
Ultimately, the answer I uncovered is a compromise, a workaround using a different Digital Ocean (D.O.) product. The workaround was to use a Managed Database (which simplifies things but gives you much less control) rather than a Droplet (which involves manual Linux/Redis installation and configuration, but gives you greater control). This isn't ideal for 2 reasons. First, it costs more ($6 vs $15 base cost). Second, I would have preferred to be able to work out how to manually setup Redis (and thus maintain greater control). However, I'll take a working solution over no solution for a very niche issue.
The steps to use a D.O. Managed Redis DB are:
Provision the managed Redis DB
Use the Public Network Connection String (as the connection string includes the password I store this in an environment variable)
Ensure that you have the appropriate ssl setting in the 'celery.py' file (snippet below)
celery.py
import os
from celery import Celery
from ssl import CERT_NONE
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj_name.settings')
app = Celery(
'proj_name',
broker_use_ssl={'ssl_cert_reqs': ssl.CERT_NONE},
redis_backend_use_ssl={'ssl_cert_reqs': ssl.CERT_NONE}
)
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print(f'Request: {self.request!r}')
settings.py
REDIS_URI = os.environ.get('REDIS_URI')
CELERY_BROKER_URL = f'{REDIS_URI}/0'
CELERY_RESULT_BACKEND = f'{REDIS_URI}/1'
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = TIME_ZONE

print out celery settings for django project?

I'd like to know if celery settings specified in the settings.py file is actually being recognized.
How can I tell if celery picked up the options?
You can inspect the _conf object on the Celery app instance, after is has been initalised and configured:
app = Celery("yourproject")
app.config_from_object("django.conf:settings", namespace="CELERY")
print(app._conf)

why we use local file settings when initalizing celery app

Most of the places i see
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'mysite.settings.local') --> ??
app = Celery('mysite')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
what is the use of exporting local settings in project, i have seen this in many projects in production we are using local settings, although local mostly inherit base settings where all celery config is defined, by why not mysite.settings.production ?
os.environ.setdefault will first look to content of DJANGO_SETTINGS_MODULE environment variable if not found set it to default value
You don't want to have hassle of setting DJANGO_SETTINGS_MODULE environment variable on every development machine, but in production you will set this variable to production config.

django on heroku: celery worker gets 403 forbidden when accessing s3 media to read and process media files

I'm really stuck on this one because I'm not sure where to start:
My Django project allows users to upload a spreadsheet and the app then processes and aggregates the uploaded data.
The file is uploaded to the MEDIA_URL using a standard form and Django model with a FileField.
Once it's uploaded a celery worker accesses the file and processes it, writing the output to another model.
This works fine locally, but is not working in production. I'm deploying to heroku, and using the cookiecutter-django project template. I've set up an s3 bucket and am using the django-storages library.
The files upload without a problem - I can access and delete them in the Django admin, and also in the s3 bucket.
However when the celery worker tries to read the file, I get an HTTP Error 403: Forbidden.
I'm not sure how to approach this problem, because I am not sure which part of the stack contains my mistake. Could it be my tasks.py module, heroku:redis addon, or settings.py module?
It's necessary to tell celery where to get its configuration from (which settings file to use). I wasn't updating the config to production settings when deploying.
This is my fixed celery_app.py
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "config.settings.production")
app = Celery("<project_name>")
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object("django.conf:settings", namespace="CELERY")
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()

Django: Load production settings for celery

My Django project has multiple settings file for development, production and testing. And I am using supervisor to manage the celery worker. My question how to load the settings file for celery based on the environment I am in.
By using environment variables. Let's say you have the following setting files at the root of your repository.
config.settings.development.py
config.settings.production.py
...
The recommended way to have your celery instance defined is in your config like celery.py module:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
# os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'config.settings.production')
app = Celery('proj')
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
Instead of setting the DJANGO_SETTINGS_MODULE variable within the module (I have commented that out) make sure that those are present in the environment at the time that supervisord is started.
To set those variables in your staging, testing, and production system you can execute the following bash command.
E.g. on your production system:
$ export DJANGO_SETTINGS_MODULE=config.settings.production
$ echo $DJANGO_SETTINGS_MODULE
I would also suggest you load them from an .env file. In my opinion thats more convenient. You could do that with for example python-dotenv.
Update
The .env file is mostly unique on your different systems and is usually not under source/version control. By unique I mean for development you may have a more verbose LOG_LEVEL or a different SECRET_KEY, because you don't want them to show up in your source control or want to be able to adjust them without modifying your source files.
So, in your base.py (production.py and development.py are inheriting) you can load the variables from the file with for example:
import os
from dotenv import load_dotenv
load_dotenv() # .env file has to be in the same directory
# ...
import os
DJANGO_SETTINGS_MODULE = os.getenv("DJANGO_SETTINGS_MODULE")
print(DJANGO_SETTINGS_MODULE)
# ...
I personally don't use the package since I use docker, which has a declarative way of defining an .env file but the code above should give you an idea how it could work. There are similar packages out there like django-environ which is featured in the book Two Scoops of Django. So I would tend to use this instead of python-dotenv, a matter of taste.
You likely want to configure different settings files. From here you have two options. You can use the django-admin settings param at runtime
django-admin runserver --settings=thecelery.settings
Also, you may have the option to configure settings in the settings.py. If you have a 1 settings file currently, this would require you to setup additional settings files, and set environment variables on the instance. Then in your base settings file, you can do soemthing like this
import os
environment your_env = os.environ["environment"]
if your_env == "celery":
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "thecelerysettings")
else:
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "defaultsettings")