Periodic tasks in Django on Elastic Beanstalk (possibly with celery beat)

Periodic tasks in Django on Elastic Beanstalk (possibly with celery beat) - django

I'm trying to set up a daily task for my Django application on Elastic Beanstalk. There doesn't appear to be an accepted way to set this up, as celery beat is the go-to solution for periodic tasks in Django, but isn't great for load-balanced environments.
I've seen some solutions doing things like setting up celery beat with leader_only=True, to only run one instance, but that leaves a single point of failure. I've seen other solutions that allow many instances of celery beat and use locks to make sure only one task goes through, but wouldn't this still eventually fail completely unless the failed instances were restarted? Another suggestion I've seen is to have a separate instance for running celery beat, but this would still be a problem unless it had some way of restarting itself if it failed.
Are there any decent solutions to this problem? I would much rather not have to babysit a scheduler, as it would be pretty easy to not notice that my task was not being run until a while later.

If you're using redis as your broker, look into installing RedBeat as the celery beat scheduler: https://github.com/sibson/redbeat
This scheduler uses locking in redis to make sure only a single beat instance is running. With this you can enable beat on each node's worker process and remove the use of leader_only=True.
celery worker -B -S redbeat.RedBeatScheduler
Let's say you have Worker A with beat lock and Worker B. If Worker A dies, Worker B will attempt to acquire the beat lock after a configurable amount of time.

I would suggest making a management command that runs with cron.
Using this method, you have your full Django ORM, all methods, etc. to work with. Wrapping your script in a try/except, you have the option to log failures in any way that you wish - email notifications, external logging systems like Sentry, straight to the DB, etc.
I user supervisord to run cron and it works well. It relies on time-tested tools that won't let you down.
Finally, using a database singleton to keep track of if a batch job has been run or is currently running in an environment where you have multiple instances of Django running load-balanced isn't bad practice, even if you feel a little icky about it. The DB is a very reliable means of telling you if the DB is being processed.
The one annoying thing about cron is that it doesn't import environment variables you may need for Django. I solved this with a simple Python script.
It writes the crontab on startup with needed environment variables etc. included. This example is for Ubuntu on EBS but should be relevant.
#!/usr/bin/env python
# run-cron.py
# sets environment variable crontab fragments and runs cron
import os
from subprocess import call
from master.settings import IS_AWS
# read django's needed environment variables and set them in the appropriate crontab fragment
eRDS_HOSTNAME = os.environ["RDS_HOSTNAME"]
eRDS_DB_NAME = os.environ["RDS_DB_NAME"]
eRDS_PASSWORD = os.environ["RDS_PASSWORD"]
eRDS_USERNAME = os.environ["RDS_USERNAME"]
try:
eAWS_STAGING = os.environ["AWS_STAGING"]
except KeyError:
eAWS_STAGING = None
try:
eAWS_PRODUCTION = os.environ["AWS_PRODUCTION"]
except KeyError:
eAWS_PRODUCTION = None
eRDS_PORT = os.environ["RDS_PORT"]
if IS_AWS:
fto = '/etc/cron.d/stortrac-cron'
else:
fto = 'test_cron_file'
with open(fto,'w+') as file:
file.write('# Auto-generated cron tab that imports needed variables and runs a python script')
file.write('\nRDS_HOSTNAME=')
file.write(eRDS_HOSTNAME)
file.write('\nRDS_DB_NAME=')
file.write(eRDS_DB_NAME)
file.write('\nRDS_PASSWORD=')
file.write(eRDS_PASSWORD)
file.write('\nRDS_USERNAME=')
file.write(eRDS_USERNAME)
file.write('\nRDS_PORT=')
file.write(eRDS_PORT)
if eAWS_STAGING is not None:
file.write('\nAWS_STAGING=')
file.write(eAWS_STAGING)
if eAWS_PRODUCTION is not None:
file.write('\nAWS_PRODUCTION=')
file.write(eAWS_PRODUCTION)
file.write('\n')
# Process queue of gobs
file.write('\n*/8 * * * * root python /code/app/manage.py queue --process-queue')
# Every 5 minutes, double-check thing is done
file.write('\n*/5 * * * * root python /code/app/manage.py thing --done')
# Every 4 hours, do this
file.write('\n8 */4 * * * root python /code/app/manage.py process_this')
# etc.
file.write('\n3 */4 * * * root python /ode/app/manage.py etc --silent')
file.write('\n\n')
if IS_AWS:
args = ["cron","-f"]
call(args)
And in supervisord.conf:
[program:cron]
command = python /my/directory/runcron.py
autostart = true
autorestart = false

Related

Django Celery with Redis Issues on Digital Ocean App Platform

After quite a bit of trial and error and a step by step attempt to find solutions I thought I share the problems here and answer them myself according to what I've found. There is not too much documentation on this anywhere except small bits and pieces and this will hopefully help others in the future.
Please note that this is specific to Django, Celery, Redis and the Digital Ocean App Platform.
This is mostly about the below errors and further resulting implications:
OSError: [Errno 38] Function not implemented
and
Cannot connect to redis://......
The first error happens when you try run the celery command celery -A your_app worker --beat -l info
or similar on the App Platform. It appears that this is currently not supported on digital ocean. The second error occurs when you make a number of potential mistakes.

PART 1:
While Digital Ocean might remedy this in the future here is an approach that will offer a workaround. The problem is the not supported execution pool. Google "celery execution pools" if you want to know more and how they work. The default one is prefork. But what you need is either gevent or eventlet. I went with the former for my purposes.
Whichever you pick you will have to install it as it doesn't come with celery by default. In my case it was: pip install gevent (and don't forget adding it to your requirements as well).
Once you have that you can re-run the celery command but note that gevent and beat are not supported within a single command (will result in an error). Instead do the following:
celery -A your_app worker --pool=gevent -l info
and then separately (if you want to run beat that is) in another terminal/console
celery -A your_app beat -l info
In the first line you can also specify the concurrency like so: --concurrency=100. This is not required but useful. Read up on it what it does as that goes beyond the solution here.
PART 2:
In my specific case I've tested the above locally (development) first to make sure they work. The next issue was getting this into production. I use Redis as the db/broker.
In my specific setup I have most of my celery configuration in the_main_app/celery/__init__.py file but sometimes people put it directly into the_main_app/celery.py. Whichever it is you do make sure that the REDIS_URL is set correctly. For development it usually looks something like this:
YOUR_VAR_NAME = os.environ.get('REDIS_URL', 'redis://localhost:6379') where YOUR_VAR_NAME is then set to the broker with everything as below:
YOUR_VAR_NAME = os.environ.get('REDIS_URL', 'redis://localhost:6379')
app = Celery('the_main_app')
app.conf.broker_url = YOUR_VAR_NAME
The remaining settings are all documented on the "celery first steps with django" help page but are not relevant for what I am showing here.
PART 3:
When you setup your Redis Database on the App Platform (which is very simple) you will see the connection details as 'public network' and 'VPC network'.
The celery documentation says to use the following URL format for production: redis://:password#hostname:port/db_number. This didn't work. If you are not using a yaml file then you can simply copy paste the entire connection string (select from the dropdown!) from the Redis DB connection details and then setup an App-Level environment variable in your Digital Ocean project named REDIS_URL and paste in that entire string (and also encrypt it!).
The string should look like something like this (redis with 2 s!)
rediss://USER:PASS#URL.db.ondigitialocean.com:PORT.
You are almost done. The last step is to setup the workers. It was fine for me to run the PART 1 commands as console commands on the App Platform to test them but eventually I've setup a small worker (+ Add Component) for each line pasted them into the Run Command.
That is basically the process step by step. Good luck!

Removing all Celery results from a Redis backend

Is there any way in Celery to remove all previous task results via command line? Everything I can find references purge, but that doesn't seem to be for task results. Other solutions I have found include using a Celery beat which periodically removes it, but I'm looking for a one-off command line solution.
I use Celery 4.3.0.

Here's what you're looking for I think:
https://github.com/celery/celery/issues/4656
referencing
https://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-result_expires
I set this up as follows:
RESULT_EXPIRE_TIME = 60 * 60 * 4 # keep tasks around for four hours
...
celery = celery.Celery('tasks',
broker=Config.BROKER_URL,
backend=Config.CELERY_RESULTS_BACKEND,
include=['tasks.definitions'],
result_expires=RESULT_EXPIRE_TIME)

So based on this answer: How do I delete everything in Redis?
With redis-cli:
FLUSHDB - Removes data from your connection's CURRENT database.
FLUSHALL - Removes data from ALL databases.
Redis documentation:
flushdb
flushall
For example, in your shell:
redis-cli flushall
and try to purge celery as well.
From celery doc: http://docs.celeryproject.org/en/latest/faq.html?highlight=purge#how-do-i-purge-all-waiting-tasks

Can I review and delete Celery / RabbitMQ tasks individually?

I am running Django + Celery + RabbitMQ. After modifying some task names I started getting "unregistered task" KeyErrors, even after removing tasks with this key from the Periodic tasks table in Django Celery Beat and restarting the Celery worker.
It turns out Celery / RabbitMQ tasks are persistent. I eventually resolved the issue by reimplementing the legacy tasks as dummy methods.
In future, I'd prefer not to purge the queue, restart the worker or reimplement legacy methods. Instead I'd like to inspect the queue and individually delete any legacy tasks. Is this possible? (Preferably in the context of the Django admin interface.)

Celery inspect may help
To view active queues:
celery -A proj inspect active_queues
To terminate a process:
celery -A proj control invoke process_id
To see all availble inspect options:
celery inspect --help

Celery, route all tasks in a Django project to a specific queue

I have a machine which runs two copies of the same Django project, let's call them A and B, and I want to use Celery to process background tasks.
I set up supervisor to launch two workers, one for each project, but given the tasks have the same names in both projects, sometimes tasks are run by the wrong worker.
My next step was to use a different queue for each worker, using the -Q queueName parameter. Using rabbitmqctl list_queues I could see that both queues were created. The command I'm using to issue the workers is
python3 -m celery worker -A project -l INFO -Q q1 --hostname=celery1#ubuntu
and
python3 -m celery worker -A project -l INFO -Q q2 --hostname=celery2#ubuntu
The question is, how do I route all the tasks from project A to queue A, and all tasks from project B to queue B? Yes, I've seen you can add a parameter to the task decorator to select the queue, but I'm looking for a global setting or something like that.
Edit 1: I've tried using CELERY_DEFAULT_QUEUE but it doesn't work, the setting gets ignored. I've also tried creating a dumb Router, like this:
class MyRouter(object):
def route_for_task(self, task, args=None, kwargs=None):
return 'q1'
CELERY_ROUTES = (MyRouter(), )
And it works (obviously return different queues in each project), but I'm baffled, why is the CELERY_DEFAULT_QUEUE setting ignored?

At the end it was easier than I expected. I just had to set up both the default queue and the default routing key (and optionally the default exchange, as long as it's a direct exchange).
CELERY_DEFAULT_QUEUE = 'q2'
CELERY_DEFAULT_EXCHANGE = 'q2'
CELERY_DEFAULT_ROUTING_KEY = 'q2'
I had some concepts unclear, but after following the official RabbitMQ's tutorials they got much clearer and I was able to fix the problem.

You can define the task routing as
CELERY_ROUTES = {
'services.qosservice.*': {'queue': 'qos_celery'},
}
The * is the celery-supported wildcard.
Reference: http://docs.celeryproject.org/en/latest/userguide/routing.html#automatic-routing

I think you must define rouing in settings.py
CELERY_ROUTES = {
'services.qosservice.set_qos_for_vm': {'queue': 'qos_celery'},
'services.qosservice.qos_recovery': {'queue': 'qos_celery'},
'services.qosservice.qos_recovery_compute': {'queue': 'qos_celery_1'},
}
In my exmaple. task set_qos_for_vm will be routing to qos_celery queue and task qos_recovery_compute queue will be routing to qos_celery_1.
More detail: http://docs.celeryproject.org/en/latest/userguide/routing.html#id2
Hope it help you

Check if celery beat is up and running

In my Django project, I use Celery and Rabbitmq to run tasks in background.
I am using celery beat scheduler to run periodic tasks.
How can i check if celery beat is up and running, programmatically?

Make a task to HTTP requests to a Ping URL at regular intervals. When the URL is not pinged on time, the URL monitor will send you an alert.
import requests
from yourapp.celery_config import app
#app.task
def ping():
print '[healthcheck] pinging alive status...'
# healthchecks.io works for me:
requests.post("https://hchk.io/6466681c-7708-4423-adf0-XXXXXXXXX")
This celery periodic task is scheduled to run every minute, if it doesn't hit the ping, your beat service is down*, the monitor will kick in your mail (or webhook so you can zapier it to get mobile push notifications as well).
celery -A yourapp.celery_config beat -S djcelery.schedulers.DatabaseScheduler
*or overwhelmed, you should track tasks saturation, this is a nightmare with Celery and should be detected and addressed properly, happens frequently when the workers are busy with blocking tasks that would need optimization

Are you use upstart or supervison or something else to run celery workers + celery beat as a background tasks? In production you should use one of them to run celery workers + celery beat in background.
Simplest way to check celery beat is running: ps aux | grep -i '[c]elerybeat'. If you get text string with pid it's running. Also you can make output of this command more pretty: ps aux | grep -i '[c]elerybeat' | awk '{print $2}'. If you get number - it's working, if you get nothing - it's not working.
Also you can check celery workers status: celery -A projectname status.
If you intrested in advanced celery monitoring you can read official documentation monitoring guide.

If you have daemonized celery following the tutorial of the celery doc, checking if it's running or not can be done through
sudo /etc/init.d/celeryd status
sudo /etc/init.d/celerybeat status
You can use the return of such commands in a python module.

You can probably look up supervisor.
It provides a celerybeat conf which logs everything related to beat in /var/log/celery/beat.log.
Another way of going about this is to use Flower. You can set it up for your server (make sure its password protected), it somewhat becomes easier to notice in the GUI the tasks which are being queued and what time they are queued thus verifying if your beat is running fine.

I have recently used a solution similar to what #panchicore suggested, for the same problem.
Problem in my workplace was an important system working with celery beat, and once in a while, either due to RabbitMQ outage, or some connectivity issue between our servers and RabbitMQ server, due to which celery beat just stopped triggering crons anymore, unless restarted.
As we didn't have any tool handy, to monitor keep alive calls sent over HTTP, we have used statsd for the same purpose. There's a counter incremented on statsd server every minute(done by a celery task), and then we setup email & slack channel alerts on the grafana metrics. (no updates for 10 minutes == outage)
I understand it's not purely a programatic approach, but any production level monitoring/alerting isn't complete without a separate monitoring entity.
The programming part is as simple as it can be. A tiny celery task running every minute.
#periodic_task(run_every=timedelta(minutes=1))
def update_keep_alive(self):
logger.info("running keep alive task")
statsd.incr(statsd_tags.CELERY_BEAT_ALIVE)
A problem that I have faced with this approach, is due to STATSD packet losses over UDP. So use TCP connection to STATSD for this purpose, if possible.

You can check scheduler running or not by the following command
python manage.py celery worker --beat

While working on a project recently, I used this:
HEALTHCHECK CMD ["stat celerybeat.pid || exit 1"]
Essentially, the beat process writes a pid file under some location (usually the home location), all you have to do is to get some stats to check if the file is there.
Note: This worked while launching a standalone celery beta process in a Docker container

The goal of liveness for celery beat/scheduler is to check if the celery beat/scheduler is able to send the job to the message broker so that it can be picked up by the respective consumer. [Is it still working or in a hung state]. The celery worker and celery scheduler/beat may or may not be running in the same pod or instance.
To handle such scenarios, we can create a method update_scheduler_liveness with decorator #after_task_publish.connect which will be called every time when the scheduler successfully publishes the message/task to the message broker.
The method update_scheduler_liveness will update the current timestamp to a file every time when the task is published successfully.
In Liveness probe, we need to check the last updated timestamp of the file either using:
stat --printf="%Y" celery_beat_schedule_liveness.stat command
or we can explicitly try to read the file (read mode) and extract the timestamp and compare if the the timestamp is recent or not based on the liveness probe criteria.
In this approach, the more minute liveness criteria you need, the more frequent a job must be triggered from the celery beat. So, for those cases, where the frequency between jobs is pretty huge, a custom/dedicated liveness heartbeat job can be scheduled every 2-5 mins and the consumer can just process it. #after_task_publish.connect decorator provides multiple arguments that can be also used for filtering of liveness specific job that were triggered
If we don't want to go for file based approach, then we can rely on Redis like data-source with instance specific redis key as well which needs to be implemented on the same lines.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js