How to get results from celery tasks from another app in another host - django

I'm using django, celery and rabbitmq to process tasks, in let's call it APP1. In another host i have APP2, that needs to get results from tasks processed in APP1.
Both APP/hosts have access to rabbitmq, and my first approach was to simple try to share a queue from both APPs without success.
What is the best approach to achieve this?

One possible approach is to have the task run on APP1, and when it is done processing the task, post another task to celery. Call this new task ProcessResults. The data for this task will be the result of the original task. The worker for this new task would be located on APP2.

Just use the same result backend that you used in APP1, for example in APP2:
from celery import Celery
from celery.result import AsyncResult
# set the backend URL that APP1 is using
app = Celery(backend='backend_url')
# The task ID that was queued in APP1
task = AsyncResult('task_id')
# get the task result
task.result
You need to store the task ID from APP1 to be able to get its result in APP2, or maybe use a custom task ID if that can help without storing it, but you need to use Task.apply_async() to set a custom ID:
task.apply_async(args, kwargs, task_id='custom_id')

Related

Is celery-beats can only trigger a celery task or normal task (Django)?

I am workign on a django project with celery and celery-beats. My main use case is use celery-beats to set up a periodical task as a background task, instead of using a front-end request to trigger. I would save the results and put it inside model, then pull the model to front-end view as a view to user.
My current problem is, not matter how I change the way I am calling my task, it always throwing the task is not registered in the task list inside celery.
I am trying to trigger a non-celery task(inside, it will call a celery taskthe , using celery beats module,
Below is the pesudo-code.
tasks.py:
#app.shared_task
def longrunningtask(a):
res = APIcall(a)
return res
caller.py:
from .task import longrunningtask
def dosomething(input_list):
for ele in input_list:
res.append(longrunningtask.delay(ele))
return res
Periodical Task :
schedule, created = CrontabSchedule.objects.get_or_create(hour = 1, minute = 34)
task = PeriodicTask.objects.create(crontab=schedule, name="XXX_task_", task='app.caller.dosomething'))
return HttpResponse("Done")
Nothing special about the periodical task, but This never works for me. It errored that not detected tasks or not registered tasks if I do not make the dosomething() as celery task.
Problem is I do not want to make the caller function a celery task, the reason being, that
Inside for loop, I would make parameter passing into the task(), I would like to see multiple celery long runing task is running with the for loop passing it and kick it. so I would create mutliple sub-task instead of as one giant running task.
Not necessary since longrunningtask is the task I need it to be run as celery task, no need its parent to be inside celery task.
Can someone please help me out of this dilemma? It's super frustrating and has been blocking me for a while.
Any suggestion or idea of this use case is also superhelpful!

Celery task calls endpoint. Is it celery or the django server that does the job?

This is a generic question that I seek answer to because of a celery task I saw in my company's codebase from a previous employee.
It's a shared task that calls an endpoint like
#shared_task(time_limit=60*60)
def celery_task_here(some_args):
data = get_data(user, url, server_name)
# some other logic to build csv and stuff
def get_data(user, url, server_name):
client = APIClient()
client.force_authenticate(user=user)
response = client.get(some_url, format='json', SERVER_NAME=server_name)
and all the logic resides in that endpoint.
Now what I understand is that this will make the server do all the work and do not utilize celery's advantage, but I do see celery log producing queries when I run this locally. I'd like to know who's actually doing the job in this case, celery or the django server?
If the task is called via celery_task_here.delay, the task will be pushed to a queue, then the worker process that is responsible for handling the queue will actually execute the task, which is not the "Django server". The worker process could potentially be on the same machine as your Django instance, it depends on your environment.
If you were to call the task via celery_task_here.s (or as a normal function) the task would be executed by the Django server.
It depends of how the task is called
If it is meant to be called as celery task with apply_async or delay than it is executed as celery task by celery worker process
You still can call it as normal function without sending it to celery if you just call it as function

How to achieve below objective.?

I am using celery with Django. Redis is my broker. I am serving my Django app via Apache and WSGI. I am running celery in supervisor mode. I am starting up a celery task named run_forever from wsgi.py file of my Django project. My intention was to start a celery task when Django starts up and run it forever in the background (I don't know if it is the right way to achieve the same. I searched it but couldn't find appropriate implementation. If you have any better idea, kindly share). It is working as expected. Now due to certain issue, I have added maximum-requests-250 parameter in the virtual host of apache. By doing so when it gets 250 requests it restarts the WSGI process.
So when every time it restarts a celery task 'run_forever' is created and run in the background. Eventually, when the server gets 1000 requests WSGI process would have restarted 4 times and I end in having 4 copies of 'run_forever' task. I only want to have one copy of the task to run at any point in time. So I would like to kill all the currently running 'run_forever' task every time the Django starts.
I have tried
from project.celery import app
from project.tasks import run_forever
app.control.purge()
run_forever.delay()
in wsgi.py to kill all the running tasks before starting `run_forever'. But didn't work
I have to agree with Dave Smith here--why do you have a task that runs forever? That said, to the extent that you want to safeguard a task from running twice, there are multiple strategies you can use. The easiest for implementation is using a database entry (since databases can be transactional and if you re using django, presumably you are using at least one database). n.b., in the code snippet below, I did not put my model in the right place to be picked up by a migration--I just put it in the same snippet for ease of discussion.
import time
from myapp.celery import app
from django.db import models
class CeleryGuard(models.Model):
task_name = models.CharField(max_length=32)
task_id = models.CharField(max_length=32)
#app.task(bind=True)
def run_forever(self):
created, x = CeleryGuard.objects.get_or_create(
task_name='run_forever', defaults={
'task_id': self.request.id
})
if not created:
return
# do whatever you want to here
while True:
print('I am doing nothing')
time.sleep(1440)
# make sure to cleanup after you are done
CeleryGuard.objects.filter(task_name='run_forever').delete()

Multiple celery server but same redis broker executing task twice

When I inspect celery -A proj inspect active_queues I see two servers showing their queues they are listening to and they are pointing to same default queue name celery. Still the task issued by django app gets executed twice by both servers(Once by each celery server - so two times).
I can see the transport type is also direct - the default one.
On my local task gets executed once so I am sure that the task is called only once by my django app.
What can I be missing here?
Ok, i looked up the docs, i think you need to set celerybeat-scheduler in your settings.py which makes sure tasks are being scheduled by a single scheduler.
http://celery.readthedocs.org/en/latest/configuration.html#celerybeat-scheduler
On Redis you can set the current database for the application you are running, setting the database will separate the information to use different apps.
If you are using Django the configuration is
CELERY_BROKER_VHOST = {number of the database}
If you are not using Django i beleive the configuration is CELERY_REDIS_DB or redis_db depending on your celery version
For instance for your first application could be CELERY_BROKER_VHOST = 1
For the second application could be CELERY_BROKER_VHOST = 2
and for your local development could be CELERY_BROKER_VHOST = 99
http://docs.celeryproject.org/en/latest/userguide/configuration.html#id8

Stopping/Purging Periodic Tasks in Django-Celery

I have managed to get periodic tasks working in django-celery by subclassing PeriodicTask. I tried to create a test task and set it running doing something useless. It works.
Now I can't stop it. I've read the documentation and I cannot find out how to remove the task from the execution queue. I have tried using celeryctl and using the shell, but registry.tasks() is empty, so I can't see how to remove it.
I have seen suggestions that I should "revoke" it, but for this I appear to need a task id, and I can't see how I would find the task id.
Thanks.
A task is a message, and a "periodic task" sends task messages at periodic intervals. Each of the tasks sent will have an unique id assigned to it.
revoke will only cancel a single task message. To get the id for a task you have to keep
track of the id sent, but you can also specify a custom id when you send a task.
I'm not sure if you want to cancel a single task message, or if you want to stop the periodic task from sending more messages, so I'll list answers for both.
There is no built-in way to keep the id of a task sent with periodic tasks,
but you could set the id for each task to the name of the periodic task, that way
the id will refer to any task sent with the periodic task (usually the last one).
You can specify a custom id this way,
either with the #periodic_task decorator:
#periodic_task(options={"task_id": "my_periodic_task"})
def my_periodic_task():
pass
or with the CELERYBEAT_SCHEDULE setting:
CELERYBEAT_SCHEDULE = {name: {"task": task_name,
"options": {"task_id": name}}}
If you want to remove a periodic task you simply remove the #periodic_task from the codebase, or remove the entry from CELERYBEAT_SCHEDULE.
If you are using the Django database scheduler you have to remove the periodic task
from the Django Admin interface.
PS1: revoke doesn't stop a task that has already been started. It only cancels
tasks that haven't been started yet. You can terminate a running task using
revoke(task_id, terminate=True). By default this will send the TERM signal to
the process, if you want to send another signal (e.g. KILL) use
revoke(task_id, terminate=True, signal="KILL").
PS2: revoke is a remote control command so it is only supported by the RabbitMQ
and Redis broker transports.
If you want your task to support cancellation you should do so by storing a cancelled
flag in a database and have the task check that flag when it starts:
from celery.task import Task
class RevokeableTask(Task):
"""Task that can be revoked.
Example usage:
#task(base=RevokeableTask)
def mytask():
pass
"""
def __call__(self, *args, **kwargs):
if revoke_flag_set_in_db_for(self.request.id):
return
super(RevokeableTask, self).__call__(*args, **kwargs)
Just in case this may help someone ... We had the same problem at work, and despites some efforts to find some kind of management command to remove the periodic task, we could not. So here are some pointers.
You should probably first double-check which scheduler class you're using.
The default scheduler is celery.beat.PersistentScheduler, which is simply keeping track of the last run times in a local database file (a shelve).
In our case, we were using the djcelery.schedulers.DatabaseScheduler class.
django-celery also ships with a scheduler that stores the schedule in the Django database
Although the documentation does mention a way to remove the periodic tasks:
Using django-celery‘s scheduler you can add, modify and remove periodic tasks from the Django Admin.
We wanted to perform the removal programmatically, or via a (celery/management) command in a shell.
Since we could not find a command line, we used the django/python shell:
$ python manage.py shell
>>> from djcelery.models import PeriodicTask
>>> pt = PeriodicTask.objects.get(name='the_task_name')
>>> pt.delete()
I hope this helps!