How to process unfinished tasks in django celery - django

I am working on a project in which zip/gzip files are uploaded by the user and then unzipped and processed using Celery. The website is based on Django.
Now the problem that I am facing is that there are few files that have been uploaded when Celery was not running. Is there anyway that I can re-run celery for such unprocessed files? If so, then how?
Thanks.

You have to track down those tasks manually and start them from Django shell. There are lots of tables that celery creates to keep track of those tasks. It's been a while that I haven't used celery but still my guess would be:
djcelery_crontabschedule
djcelery_taskstate
There might be something about this on celery documentation too: http://docs.celeryproject.org/en/2.3/getting-started/first-steps-with-celery.html
You can also ask your query on celery mailing list.

Related

How to log django-kronos tasks with logging module?

I have switched from celery to dramatiq and celery beat to django-kronos and now I am stuck - I am not able to figure out, how to make tasks run by kronos to log using the logging module.
Is it even possible or what is the best practice to log progress of django-kronos tasks?

Decouple and Dockerize Django and Celery

I am wondering what is the best way to decouple Celery from Django in order to dockerize the two parts and use docker swarm service? Typically one starts their celery workers and celery beat using a command that references there Django application:
celery worker -A my_app
celery beat -A my_app
From this I believe celery picks up config info from settings file and a celery.py file which is easy to move to a microservice. What I don't totally understand is how the tasks would leverage the Django ORM? Or is that not really the microservices mantra and Celery should be designed to make GET/POST calls to Django REST Framework API for the data it needs to complete the task?
I use a setup where the code for both the django app and its celery workers is the same (as in a single repository).
When deploying I make sure to have the same code release everywhere, to avoid any surprises with the ORM, etc...
Celery starts with a reference to the django app, so that it has access to the models, etc...
Communication between the workers and the main app happens either through the messaging queue (rabbitmq or redis...) or via the database (as in, the celery worker works directly in the db, since it knows the models, etc...).
I'm not sure if that follows the microservices mantra, but it does work :)
Celery's .send_task or .signature might be helpful:
https://www.distributedpython.com/2018/06/19/call-celery-task-outside-codebase/

Celery - How to route task to local worker only

I have a Django view where a user can upload a file to process.
I'd like to hand off the processing to a celery task but I need to give the task a path to the file.
The problem is I have 3 servers running the Django app and the same three servers running celery workers.
Is there a way I can tell Celery I only want the task to run on a worker that's on the same server where the file was uploaded?
(Or is there a way better way to do this? I don't have any shared locations all three servers can see files.)

Django Celery Beat admin updating Cron Schedule Periodic task not taking effect

I'm running a site using Django 10, RabbitMQ, and Celery 4 on CentOS 7.
My Celery Beat and Celery Worker instances are controlled by supervisor and I'm using the django celery database scheduler.
I've scheduled a cron style task using the cronsheduler in Django-admin.
When I start celery beat and worker instances the job fires as expected.
But if a change the schedule time in Django-admin then the changes are not picked up unless I restart the celery-beat instance.
Is there something I am missing or do I need to write my own scheduler?
Celery Beat, with the 'django_celery_beat.schedulers.DatabaseScheduler' loads the schedule from the database. According to the following doc https://media.readthedocs.org/pdf/django-celery-beat/latest/django-celery-beat.pdf this should force Celery Beat to reload:
A schedule that runs at a specific interval (e.g. every 5 seconds).
•
django_celery_beat.models.CrontabSchedule
A
schedule
with
fields
like
entries
in
cron: minute hour day-of-week day_of_month month_of_year.
django_celery_beat.models.PeriodicTasks
This model is only used as an index to keep track of when the schedule has changed. Whenever you update a PeriodicTask a counter in this table is also incremented, which tells the celery beat
service to reload the schedule from the database.
If you update periodic tasks in bulk, you will need to update the counter manually:
from django_celery_beat.models import PeriodicTasks
PeriodicTasks.changed()
From the above I would expect the Celery Beat process to check the table regularly for any changes.
i have changed the celery from 4.0 to 3.1.25, django to 1.9.11 and installed djcelery 3.1.17. Then test again, It's OK. So, maybe it's a bug.
I have a solution by:
Creating a separate worker process that consumes a RabbitMQ queue.
When Django updates the database it posts a message to the queue containing the name of the Celery Beat process (name defined by Supervisor configuration).
The worker process then restarts the named Celery Beat process.
A bit long winded but does the job. Also makes it easier to manage multiple Django apps on the same server that require the same functionality.

Django with celery: scheduled task (ETA) executed multiple times in parallel

I'm developing a web application with Django which uses Celery to process asynchronous tasks, especially for transactional emails.
One on my email task is scheduled with the ETA option but it's executed multiple times in parallel resulting in mail chain, very anoying. I can't figure out exactly why.
I checked twice my Django code and I'm sure that it is publish only one time.
I'm using Redis as a broker/backend result.
My Celery daemon is hosted on Heroku and launched via this command:
python manage.py celeryd -E -B --loglevel=INFO
Thanks for your help.
EDIT: I find a valid solution here thanks to a guy on the #celery IRC channel: http://loose-bits.com/2010/10/distributed-task-locking-in-celery.html
Have you checked the Ensuring a task is only executed one at a time docs?