Django with celery: scheduled task (ETA) executed multiple times in parallel - django

I'm developing a web application with Django which uses Celery to process asynchronous tasks, especially for transactional emails.
One on my email task is scheduled with the ETA option but it's executed multiple times in parallel resulting in mail chain, very anoying. I can't figure out exactly why.
I checked twice my Django code and I'm sure that it is publish only one time.
I'm using Redis as a broker/backend result.
My Celery daemon is hosted on Heroku and launched via this command:
python manage.py celeryd -E -B --loglevel=INFO
Thanks for your help.
EDIT: I find a valid solution here thanks to a guy on the #celery IRC channel: http://loose-bits.com/2010/10/distributed-task-locking-in-celery.html

Have you checked the Ensuring a task is only executed one at a time docs?

Related

Django celery and heroku

I have configured celery to be deployed in heroku, all Is working well, in fact in my logs at heroku celery is ready to handle the taks. Unfortunately celery doesn't pick my tasks I feel there is some disconnection, can I get some help?
If celery isn't picking up task that means that nothing is talking to its broker. Make sure that the task producer is talking to the same broker url as the celery worker (the broker url will appear in the first 10-15 lines of the celery logs).

How to log django-kronos tasks with logging module?

I have switched from celery to dramatiq and celery beat to django-kronos and now I am stuck - I am not able to figure out, how to make tasks run by kronos to log using the logging module.
Is it even possible or what is the best practice to log progress of django-kronos tasks?

Tasks created from celery tasks getting created twice

We are using celery 3.1.17 with redis backend in our flask 0.10.1 application. On our server every celery task created from some celery task is getting created twice. For example,
#celery.task(name='send_some_xyz_users_alerts')
def send_some_xyz_users_alerts():
list_of_users = find_some_list_of_users()
for user in list_of_users:
send_user_alert.delay(user)
#celery.task(name='send_user_alert')
def send_user_alert(user):
data = get_data_for_user(user)
send_mail_to_user(data)
If we start send_some_xyz_users_alerts from our application it runs once. I then see 2 send_user_alert tasks running in celery for each user. Both these tasks have different task_ids. We have 2 workers running on server. Some times these duplicate tasks run on same worker. Sometimes on different workers. I have tried lot to find the problem without any luck. Would really appreciate if someone knows why this could happen. Things were running fine for months on these versions of celery and flask and suddenly we are seeing this problem on our servers. Tasks run fine on local env.

Django Celery Beat admin updating Cron Schedule Periodic task not taking effect

I'm running a site using Django 10, RabbitMQ, and Celery 4 on CentOS 7.
My Celery Beat and Celery Worker instances are controlled by supervisor and I'm using the django celery database scheduler.
I've scheduled a cron style task using the cronsheduler in Django-admin.
When I start celery beat and worker instances the job fires as expected.
But if a change the schedule time in Django-admin then the changes are not picked up unless I restart the celery-beat instance.
Is there something I am missing or do I need to write my own scheduler?
Celery Beat, with the 'django_celery_beat.schedulers.DatabaseScheduler' loads the schedule from the database. According to the following doc https://media.readthedocs.org/pdf/django-celery-beat/latest/django-celery-beat.pdf this should force Celery Beat to reload:
A schedule that runs at a specific interval (e.g. every 5 seconds).
•
django_celery_beat.models.CrontabSchedule
A
schedule
with
fields
like
entries
in
cron: minute hour day-of-week day_of_month month_of_year.
django_celery_beat.models.PeriodicTasks
This model is only used as an index to keep track of when the schedule has changed. Whenever you update a PeriodicTask a counter in this table is also incremented, which tells the celery beat
service to reload the schedule from the database.
If you update periodic tasks in bulk, you will need to update the counter manually:
from django_celery_beat.models import PeriodicTasks
PeriodicTasks.changed()
From the above I would expect the Celery Beat process to check the table regularly for any changes.
i have changed the celery from 4.0 to 3.1.25, django to 1.9.11 and installed djcelery 3.1.17. Then test again, It's OK. So, maybe it's a bug.
I have a solution by:
Creating a separate worker process that consumes a RabbitMQ queue.
When Django updates the database it posts a message to the queue containing the name of the Celery Beat process (name defined by Supervisor configuration).
The worker process then restarts the named Celery Beat process.
A bit long winded but does the job. Also makes it easier to manage multiple Django apps on the same server that require the same functionality.

How to process unfinished tasks in django celery

I am working on a project in which zip/gzip files are uploaded by the user and then unzipped and processed using Celery. The website is based on Django.
Now the problem that I am facing is that there are few files that have been uploaded when Celery was not running. Is there anyway that I can re-run celery for such unprocessed files? If so, then how?
Thanks.
You have to track down those tasks manually and start them from Django shell. There are lots of tables that celery creates to keep track of those tasks. It's been a while that I haven't used celery but still my guess would be:
djcelery_crontabschedule
djcelery_taskstate
There might be something about this on celery documentation too: http://docs.celeryproject.org/en/2.3/getting-started/first-steps-with-celery.html
You can also ask your query on celery mailing list.