i like to add celery periodic task based on user api request , from which i can get the time and date . Is there anyway to trigger celery.beat by api call ?
Solution 1
If all you need is to execute the task at the date/time received from the API request, you don't need to run celery beat for this, all you need is configure the estimated time of arrival (ETA)
The ETA (estimated time of arrival) lets you set a specific date and time that is the earliest time at which your task will be executed.
tasks.py
from celery import Celery
app = Celery("my_app")
#app.task
def my_task(param):
print(f"param {param}")
Logs (Producer)
>>> from dateutil.parser import parse
>>> import tasks
>>>
>>> api_request = "2021-08-20T15:00+08:00"
>>> tasks.my_task.apply_async(("some param",), eta=parse(api_request))
<AsyncResult: 82a5710c-095f-49a2-9289-d6b86e53d4da>
Logs (Consumer)
$ celery --app=tasks worker --queues=celery --loglevel=INFO
[2021-08-20 14:58:18,234: INFO/MainProcess] Task tasks.my_task[82a5710c-095f-49a2-9289-d6b86e53d4da] received
[2021-08-20 15:00:00,161: WARNING/ForkPoolWorker-4] param some param
[2021-08-20 15:00:00,161: WARNING/ForkPoolWorker-4]
[2021-08-20 15:00:00,161: INFO/ForkPoolWorker-4] Task tasks.my_task[82a5710c-095f-49a2-9289-d6b86e53d4da] succeeded in 0.0005905449997953838s: None
As you can see, the task executed at exactly 15:00 as requested by the mocked API request.
Solution 2
If you need to execute the task periodically based on the API request e.g. for every minute from the time indicated, then you have to run celery beat (note that this will be a separate worker). But since this will be a dynamic update where you need to add a new task at runtime without restarting Celery, then you can't just add a new schedule because it wouldn't reflect, you can't either update the file celerybeat-schedule (the file that the celery beat scheduler is reading from time to time to execute scheduled tasks) which holds the info about the tasks that are scheduled because it is locked while the celery beat scheduler is running.
To solve this, you have to change the usage of the file celerybeat-schedule into a database so that it is possible to update it even while celery beat scheduler is running. This way, during runtime, if you update the database and add a new scheduled task, the celery beat scheduler would see it and execute accordingly, without the need of restarts.
For the solution, since you are not in a Django application (where you could use django-celery-beat), I used celery-sqlalchemy-scheduler. You can see the detailed steps in my other answer posted here: https://stackoverflow.com/a/68858483/11043825
Related
This is a generic question that I seek answer to because of a celery task I saw in my company's codebase from a previous employee.
It's a shared task that calls an endpoint like
#shared_task(time_limit=60*60)
def celery_task_here(some_args):
data = get_data(user, url, server_name)
# some other logic to build csv and stuff
def get_data(user, url, server_name):
client = APIClient()
client.force_authenticate(user=user)
response = client.get(some_url, format='json', SERVER_NAME=server_name)
and all the logic resides in that endpoint.
Now what I understand is that this will make the server do all the work and do not utilize celery's advantage, but I do see celery log producing queries when I run this locally. I'd like to know who's actually doing the job in this case, celery or the django server?
If the task is called via celery_task_here.delay, the task will be pushed to a queue, then the worker process that is responsible for handling the queue will actually execute the task, which is not the "Django server". The worker process could potentially be on the same machine as your Django instance, it depends on your environment.
If you were to call the task via celery_task_here.s (or as a normal function) the task would be executed by the Django server.
It depends of how the task is called
If it is meant to be called as celery task with apply_async or delay than it is executed as celery task by celery worker process
You still can call it as normal function without sending it to celery if you just call it as function
Currently i have a celery batch running with django like so:
Celery.py:
from __future__ import absolute_import, unicode_literals
import os
import celery
from celery import Celery
from celery.schedules import crontab
import django
load_dotenv(os.path.join(os.path.dirname(os.path.dirname(__file__)), '.env'))
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'base.settings')
django.setup()
app = Celery('base')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
app.control.purge()
sender.add_periodic_task(30.0, check_loop.s())
recursion_function.delay() #need to use recursive because it need to wait for loop to finish(time can't be predict)
print("setup_periodic_tasks")
#app.task()
def check_loop():
.....
start = database start number
end = database end number
callling apis in a list from id=start to id=end
create objects
update database(start number = end, end number = end + 3)
....
#app.task()
def recursion_function(default_retry_delay=10):
.....
do some looping
....
#when finished, call itself again
recursion_function.apply_async(countdown=30)
My aim is whenever the celery file get edited then it would restart all the task -remove queued task that not yet execute(i'm doing this because recursion_function will run itself again if it finished it's job of checking each record of a table in my database so i'm not worry about it stopping mid way).
The check_loop function will call to an api that has paging to return a list of objects and i will compare it to by record in a table , if match then create a new custom record of another model
My question is when i purge all messages, will the current running task get stop midway or it gonna keep running ? because if the check_loop function stop midway looping through the api list then it will run the loop again and i will create new duplicate record which i don't want
EXAMPLE:
during ruuning task of check_loop() it created object midway (on api list from element id=2 to id=5), server restart -> run again, now check_loop() run from beginning(on api list from element id=2 to id=5) and created object from that list again(which 100% i don't want)
Is this how it run ? i just need a confirmation
EDIT:
https://docs.celeryproject.org/en/4.4.1/faq.html#how-do-i-purge-all-waiting-tasks
I added app.control.purge() because when i restart then recursion_function get called again in setup_periodic_tasks while previous recursion_function from recursion_function.apply_async(countdown=30) execute too so it multiplied itself
Yes, worker will continue execution of currently running task unless worker is also restarted.
Also, The Celery Way is to always expect tasks to be run in concurrent environment with following considerations:
there are many tasks running concurrently
there are many celery workers executing tasks
same task may run again
multiple instances of the same task may run at the same moment
any task may be terminated any time
even if you are sure that in your environment there is only one worker started / stopped manually and these do not apply - tasks should be created in such way to allow everything of this to happen.
Some useful techniques:
use database transactions
use locking
split long-running tasks into faster ones
if task has intermediate values to be saved or they are important (i.e. non-reproducible like some api calls) and their processing in next step takes time - consider splitting into several chained tasks
If you need to run only one instance of a task at a time - use some sort of locking - create / update lock-record in the database or in the cache so others (same tasks) can check and know this task is running and just return or wait for previous one to complete.
I.e. recursion_function can also be Periodic Task. Being periodic task will make sure it is run every interval, even if previous one fails for any reason (and thus fails to queue itself again as in regular non-periodic task). With locking you can make sure only one is running at a time.
check_loop():
First, it is recommended to save results in one transaction in the database to make sure all or nothing is saved / modified in the database.
You can also save some marker that indicates how many / status of saved objects, so future tasks can just check this marker, not each object.
Or somehow perform check for each element before creating it that it already exists in the database.
I am not going to write an essay like Oleg's excellent post above. The answer is simply - all running tasks will continue running. purge is all about the tasks that are in the queue(s), waiting to be picked by Celery workers.
I have a code base with several apps each with tasks.py, and have a total of 100 of these functions
#periodic_task(run_every=crontab(minute='20'))
def sync_calendar_availability_and_prices(listing_id_list=None, reapply_rules_after_sync=False):
Its in the old format of celery periodic task definition but works fine on celery==4.1.
These get executed every so many hours or mins via beat and also I call them ad-hoc in the codebase by using .delay(). I want all the .delay() calls to go into a certain celery queue manual_call_queue and periodic beat fired calls for same function to go to periodic_beat_fired_queue -- is this an easy 1-2 line config change somewhere at a global level to do this?
I use rabbitmq, celery, django and django-celery-beat
To send periodic tasks to a specific queue, send queue/options arg.
#periodic_task(run_every=crontab(minute='20'), queue='manual_call_queue', options={'queue': 'periodic_beat_fired_queue'})
def sync_calendar_availability_and_prices(listing_id_list=None, reapply_rules_after_sync=False):
queue='manual_call_queue' is used when task is invoked with .delay or .apply_async
options={'queue': 'periodic_beat_fired_queue'} is used when celery beat invokes task.
I'm having a lot of problem executing certain tasks with celery beat. Some tasks like the one below get triggered by beat but the message is never received by rabbitmq.
In my django settings file I have the following perdiodic task
CELERYBEAT_SCHEDULE = {
...
'update_locations': {
'task': 'cron.tasks.update_locations',
'schedule': crontab(hour='10', minute='0')
},
...
}
at 10 UTC beat executes the task as expected
[2015-05-13 10:00:00,046: DEBUG/MainProcess] cron.tasks.update_locations sent. id->a1c53d0e-96ca-4673-9d03-972888581176
but this message is never arrives to rabbitmq (I'm using the tracing module in rabbitmq to track incoming messages). I have several other tasks which seem to run fine but certain tasks like the one above never run. Running the tasks manually in django with cron.tasks.update_locations.delay() runs the task with no problem. Note my Rabbitmq is on a different server than beat.
Is there anything I can do to ensure the message was actually sent and/or received by rabbitmq? Is there a better or other way to schedule these tasks to ensure they run?
A bit hard to answer from these minimal descriptions.
why is this in the Django settings file? I would have expected the Celery config settings to have their own config object.
Look at http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.config_from_object
I have managed to get periodic tasks working in django-celery by subclassing PeriodicTask. I tried to create a test task and set it running doing something useless. It works.
Now I can't stop it. I've read the documentation and I cannot find out how to remove the task from the execution queue. I have tried using celeryctl and using the shell, but registry.tasks() is empty, so I can't see how to remove it.
I have seen suggestions that I should "revoke" it, but for this I appear to need a task id, and I can't see how I would find the task id.
Thanks.
A task is a message, and a "periodic task" sends task messages at periodic intervals. Each of the tasks sent will have an unique id assigned to it.
revoke will only cancel a single task message. To get the id for a task you have to keep
track of the id sent, but you can also specify a custom id when you send a task.
I'm not sure if you want to cancel a single task message, or if you want to stop the periodic task from sending more messages, so I'll list answers for both.
There is no built-in way to keep the id of a task sent with periodic tasks,
but you could set the id for each task to the name of the periodic task, that way
the id will refer to any task sent with the periodic task (usually the last one).
You can specify a custom id this way,
either with the #periodic_task decorator:
#periodic_task(options={"task_id": "my_periodic_task"})
def my_periodic_task():
pass
or with the CELERYBEAT_SCHEDULE setting:
CELERYBEAT_SCHEDULE = {name: {"task": task_name,
"options": {"task_id": name}}}
If you want to remove a periodic task you simply remove the #periodic_task from the codebase, or remove the entry from CELERYBEAT_SCHEDULE.
If you are using the Django database scheduler you have to remove the periodic task
from the Django Admin interface.
PS1: revoke doesn't stop a task that has already been started. It only cancels
tasks that haven't been started yet. You can terminate a running task using
revoke(task_id, terminate=True). By default this will send the TERM signal to
the process, if you want to send another signal (e.g. KILL) use
revoke(task_id, terminate=True, signal="KILL").
PS2: revoke is a remote control command so it is only supported by the RabbitMQ
and Redis broker transports.
If you want your task to support cancellation you should do so by storing a cancelled
flag in a database and have the task check that flag when it starts:
from celery.task import Task
class RevokeableTask(Task):
"""Task that can be revoked.
Example usage:
#task(base=RevokeableTask)
def mytask():
pass
"""
def __call__(self, *args, **kwargs):
if revoke_flag_set_in_db_for(self.request.id):
return
super(RevokeableTask, self).__call__(*args, **kwargs)
Just in case this may help someone ... We had the same problem at work, and despites some efforts to find some kind of management command to remove the periodic task, we could not. So here are some pointers.
You should probably first double-check which scheduler class you're using.
The default scheduler is celery.beat.PersistentScheduler, which is simply keeping track of the last run times in a local database file (a shelve).
In our case, we were using the djcelery.schedulers.DatabaseScheduler class.
django-celery also ships with a scheduler that stores the schedule in the Django database
Although the documentation does mention a way to remove the periodic tasks:
Using django-celery‘s scheduler you can add, modify and remove periodic tasks from the Django Admin.
We wanted to perform the removal programmatically, or via a (celery/management) command in a shell.
Since we could not find a command line, we used the django/python shell:
$ python manage.py shell
>>> from djcelery.models import PeriodicTask
>>> pt = PeriodicTask.objects.get(name='the_task_name')
>>> pt.delete()
I hope this helps!