Celery - How to route task to local worker only - django

I have a Django view where a user can upload a file to process.
I'd like to hand off the processing to a celery task but I need to give the task a path to the file.
The problem is I have 3 servers running the Django app and the same three servers running celery workers.
Is there a way I can tell Celery I only want the task to run on a worker that's on the same server where the file was uploaded?
(Or is there a way better way to do this? I don't have any shared locations all three servers can see files.)

Related

Tasks created from celery tasks getting created twice

We are using celery 3.1.17 with redis backend in our flask 0.10.1 application. On our server every celery task created from some celery task is getting created twice. For example,
#celery.task(name='send_some_xyz_users_alerts')
def send_some_xyz_users_alerts():
list_of_users = find_some_list_of_users()
for user in list_of_users:
send_user_alert.delay(user)
#celery.task(name='send_user_alert')
def send_user_alert(user):
data = get_data_for_user(user)
send_mail_to_user(data)
If we start send_some_xyz_users_alerts from our application it runs once. I then see 2 send_user_alert tasks running in celery for each user. Both these tasks have different task_ids. We have 2 workers running on server. Some times these duplicate tasks run on same worker. Sometimes on different workers. I have tried lot to find the problem without any luck. Would really appreciate if someone knows why this could happen. Things were running fine for months on these versions of celery and flask and suddenly we are seeing this problem on our servers. Tasks run fine on local env.

Decouple and Dockerize Django and Celery

I am wondering what is the best way to decouple Celery from Django in order to dockerize the two parts and use docker swarm service? Typically one starts their celery workers and celery beat using a command that references there Django application:
celery worker -A my_app
celery beat -A my_app
From this I believe celery picks up config info from settings file and a celery.py file which is easy to move to a microservice. What I don't totally understand is how the tasks would leverage the Django ORM? Or is that not really the microservices mantra and Celery should be designed to make GET/POST calls to Django REST Framework API for the data it needs to complete the task?
I use a setup where the code for both the django app and its celery workers is the same (as in a single repository).
When deploying I make sure to have the same code release everywhere, to avoid any surprises with the ORM, etc...
Celery starts with a reference to the django app, so that it has access to the models, etc...
Communication between the workers and the main app happens either through the messaging queue (rabbitmq or redis...) or via the database (as in, the celery worker works directly in the db, since it knows the models, etc...).
I'm not sure if that follows the microservices mantra, but it does work :)
Celery's .send_task or .signature might be helpful:
https://www.distributedpython.com/2018/06/19/call-celery-task-outside-codebase/

Django Celery Beat admin updating Cron Schedule Periodic task not taking effect

I'm running a site using Django 10, RabbitMQ, and Celery 4 on CentOS 7.
My Celery Beat and Celery Worker instances are controlled by supervisor and I'm using the django celery database scheduler.
I've scheduled a cron style task using the cronsheduler in Django-admin.
When I start celery beat and worker instances the job fires as expected.
But if a change the schedule time in Django-admin then the changes are not picked up unless I restart the celery-beat instance.
Is there something I am missing or do I need to write my own scheduler?
Celery Beat, with the 'django_celery_beat.schedulers.DatabaseScheduler' loads the schedule from the database. According to the following doc https://media.readthedocs.org/pdf/django-celery-beat/latest/django-celery-beat.pdf this should force Celery Beat to reload:
A schedule that runs at a specific interval (e.g. every 5 seconds).
•
django_celery_beat.models.CrontabSchedule
A
schedule
with
fields
like
entries
in
cron: minute hour day-of-week day_of_month month_of_year.
django_celery_beat.models.PeriodicTasks
This model is only used as an index to keep track of when the schedule has changed. Whenever you update a PeriodicTask a counter in this table is also incremented, which tells the celery beat
service to reload the schedule from the database.
If you update periodic tasks in bulk, you will need to update the counter manually:
from django_celery_beat.models import PeriodicTasks
PeriodicTasks.changed()
From the above I would expect the Celery Beat process to check the table regularly for any changes.
i have changed the celery from 4.0 to 3.1.25, django to 1.9.11 and installed djcelery 3.1.17. Then test again, It's OK. So, maybe it's a bug.
I have a solution by:
Creating a separate worker process that consumes a RabbitMQ queue.
When Django updates the database it posts a message to the queue containing the name of the Celery Beat process (name defined by Supervisor configuration).
The worker process then restarts the named Celery Beat process.
A bit long winded but does the job. Also makes it easier to manage multiple Django apps on the same server that require the same functionality.

Where celery stores task functions?

Have a Django app that runs periodic tasks via celery+kombu+Oracle. I`ve spent some time, until noticed that to change the tasks code celery worker needs to be restarted, not the Django server (uWSGI).
The question is, where does celery stores that code? Some sort of cache or what?
A Celery system consists of 1 or more (usually python) processes which load your methods/tasks in memory.
It's the same as launching an interactive shell. If you do:
>>> from spam import eggs
eggs will be allocated to a memory slot. If you edit eggs, you'll have to restart the shell to see the changes.
Celery runs several worker processes, separate from the django server process.
These processes load the python code into memory and execute it. They continue running until shut down.
If you update the python code on disk the change will not be picked up by the running processes - you will need to restart them.

How to process unfinished tasks in django celery

I am working on a project in which zip/gzip files are uploaded by the user and then unzipped and processed using Celery. The website is based on Django.
Now the problem that I am facing is that there are few files that have been uploaded when Celery was not running. Is there anyway that I can re-run celery for such unprocessed files? If so, then how?
Thanks.
You have to track down those tasks manually and start them from Django shell. There are lots of tables that celery creates to keep track of those tasks. It's been a while that I haven't used celery but still my guess would be:
djcelery_crontabschedule
djcelery_taskstate
There might be something about this on celery documentation too: http://docs.celeryproject.org/en/2.3/getting-started/first-steps-with-celery.html
You can also ask your query on celery mailing list.