Djcelery tasks still scheduled even after removing the task - django

I have scheduled a task using django celery (djcelery) by defining it as so:
#periodic_task(run_every=timedelta(minutes=1))
def mytask():
# Do something
I decided to remove this task from the codebase.
However, even after restarting the celery server, this task continues to be scheduled every 1 minute, although it reports an error message since this task no longer exists. Do I have to do something to clear old periodic tasks from the djcelery database, in addition to restarting the server?

You might need to remove your task from the table djcelery_periodictask as well.

Related

Celery what happen to running tasks when using app.control.purge()?

Currently i have a celery batch running with django like so:
Celery.py:
from __future__ import absolute_import, unicode_literals
import os
import celery
from celery import Celery
from celery.schedules import crontab
import django
load_dotenv(os.path.join(os.path.dirname(os.path.dirname(__file__)), '.env'))
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'base.settings')
django.setup()
app = Celery('base')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
app.control.purge()
sender.add_periodic_task(30.0, check_loop.s())
recursion_function.delay() #need to use recursive because it need to wait for loop to finish(time can't be predict)
print("setup_periodic_tasks")
#app.task()
def check_loop():
.....
start = database start number
end = database end number
callling apis in a list from id=start to id=end
create objects
update database(start number = end, end number = end + 3)
....
#app.task()
def recursion_function(default_retry_delay=10):
.....
do some looping
....
#when finished, call itself again
recursion_function.apply_async(countdown=30)
My aim is whenever the celery file get edited then it would restart all the task -remove queued task that not yet execute(i'm doing this because recursion_function will run itself again if it finished it's job of checking each record of a table in my database so i'm not worry about it stopping mid way).
The check_loop function will call to an api that has paging to return a list of objects and i will compare it to by record in a table , if match then create a new custom record of another model
My question is when i purge all messages, will the current running task get stop midway or it gonna keep running ? because if the check_loop function stop midway looping through the api list then it will run the loop again and i will create new duplicate record which i don't want
EXAMPLE:
during ruuning task of check_loop() it created object midway (on api list from element id=2 to id=5), server restart -> run again, now check_loop() run from beginning(on api list from element id=2 to id=5) and created object from that list again(which 100% i don't want)
Is this how it run ? i just need a confirmation
EDIT:
https://docs.celeryproject.org/en/4.4.1/faq.html#how-do-i-purge-all-waiting-tasks
I added app.control.purge() because when i restart then recursion_function get called again in setup_periodic_tasks while previous recursion_function from recursion_function.apply_async(countdown=30) execute too so it multiplied itself
Yes, worker will continue execution of currently running task unless worker is also restarted.
Also, The Celery Way is to always expect tasks to be run in concurrent environment with following considerations:
there are many tasks running concurrently
there are many celery workers executing tasks
same task may run again
multiple instances of the same task may run at the same moment
any task may be terminated any time
even if you are sure that in your environment there is only one worker started / stopped manually and these do not apply - tasks should be created in such way to allow everything of this to happen.
Some useful techniques:
use database transactions
use locking
split long-running tasks into faster ones
if task has intermediate values to be saved or they are important (i.e. non-reproducible like some api calls) and their processing in next step takes time - consider splitting into several chained tasks
If you need to run only one instance of a task at a time - use some sort of locking - create / update lock-record in the database or in the cache so others (same tasks) can check and know this task is running and just return or wait for previous one to complete.
I.e. recursion_function can also be Periodic Task. Being periodic task will make sure it is run every interval, even if previous one fails for any reason (and thus fails to queue itself again as in regular non-periodic task). With locking you can make sure only one is running at a time.
check_loop():
First, it is recommended to save results in one transaction in the database to make sure all or nothing is saved / modified in the database.
You can also save some marker that indicates how many / status of saved objects, so future tasks can just check this marker, not each object.
Or somehow perform check for each element before creating it that it already exists in the database.
I am not going to write an essay like Oleg's excellent post above. The answer is simply - all running tasks will continue running. purge is all about the tasks that are in the queue(s), waiting to be picked by Celery workers.

on heroku, celery beat database scheduler doesn’t run periodic tasks

I have an issue where django_celery_beat’s DatabaseScheduler doesn’t run periodic tasks. Or I should say where celery beat doesn’t find any tasks when the scheduler is DatabaseScheduler. In case I use the standard scheduler the tasks are executed regularly.
I setup celery on heroku by using a dyno for worker and one for beat (and one for web, obviously).
I know that beat and worker are connected to redis and to postgres for task results.
Every periodic task I run from django admin by selecting a task and “run selected task” gets executed.
However, it is about two days that I’m trying to figure out why there isn’t a way for beat/worker to find that I scheduled a task to execute every 10 seconds, or using a cron (even restarting beat and remot doesn’t change it).
I’m kind of desperate, and my next move would be to give redbeat a try.
Any help on how to how to troubleshoot this particular problem would be greatly appreciated. I suspect the problem is in the is_due method. I am using UTC (in celery and django), all cron are UTC based. All I see in the beat log is “writing entries..” every on and then.
I’ve tried changing celery version from 4.3 to 4.4 and django celery beat from 1.4.0 to 1.5.0 to 1.6.0
Any help would be greatly appreciated.
In case it helps someone who's having or will have a similar trouble as ours: to recreate this issue, it is possible to create a task as simple as:
#app.task(bind=True)
def test(self, arg):
print(kwargs.get("notification_id"))
then, in django admin, enter the task editing and put something in the extra args field. Or, viceversa, the task could be
#app.task(bind=True)
def test(self, **kwargs):
print(notification_id)
And try to pass a positional argument. While locally this breaks, on Heroku's beat and worker dyno, this somehow slips away unnoticed, and django_celery_beats stop processing any task whatsoever in the future. The scheduler is completely broken by a "wrong" task.

Do I need to use celery.result.forget when using a database backend?

I've come across the following warning:
Backends use resources to store and transmit results. To ensure that resources are released, you must eventually call get() or forget() on EVERY AsyncResult instance returned after calling a task.
I am currently using the django-db backend and I am wondering about the consequences of not heeding this warning. What resources will not be "released" if I don't forget an AsyncResult? I'm not worried about cleaning up task results from my database. My primary concern is with the availability of workers being affected.
I've actually never seen that warning. As long as you're running celery beat, you'll be fine. Celery has a default periodic task that it sets up for you scheduled to run at 4:00 AM. That task deletes any expired results in your database if you are using a db-based backend like postgres or mysql.
Celery seems to have a setting for this which is result_expires. The documentation explains it all:
result_expires
Default: Expire after 1 day.
Time (in seconds, or a timedelta object) for when after stored task tombstones will be deleted.
But as #2ps mentioned, celery-beat must be running for database backends which as documented tells that:
A built-in periodic task will delete the results after this time (celery.backend_cleanup), assuming that celery beat is enabled. The task runs daily at 4am.
For other types of backends e.g. AMQP, it seems not necessary as documented:
Note
For the moment this only works with the AMQP, database, cache, Couchbase, and Redis backends.
When using the database backend, celery beat must be running for the results to be expired.

How to record all tasks information with Django and Celery?

In my Django project I'm using Celery with a RabbitMQ broker for asynchronous tasks, how can I record the information of all of my tasks (e.g. created time (task appears in queue), worker consume task time, execution time, status, ...) to monitor how Celery is doing?
I know there are solutions like Flower but that seems to much for what I need, django-celery-results looks like what I want but it's missing a few information I need like task created time.
Thanks!
It seems like you often find the answer yourself after asking on SO. I settled with using celery signals to do all the recording I want and store the results in a database table.

How can I get rid of legacy tasks still in the Celery / RabbitMQ queue?

I am running Django + Celery + RabbitMQ. After modifying some task names I started getting "unregistered task" KeyErrors, even after removing tasks with this key from the Periodic tasks table in Django Celery Beat and restarting the Celery worker. They persist even after running with the --purge option.
How can I get rid of them?
To flush out the last of these tasks, you can re-implement them with their old method headers, but no logic.
For example, if you removed the method original and are now getting the error
[ERROR/MainProcess] Received unregistered task of type u'myapp.tasks.original'
Just recreate the original method as follows:
tasks.py
#shared_task
def original():
# keep legacy task header so that it is flushed out of queue
# FIXME: this will be removed in the next release
pass
Once you have run this version in each environment, any remaining tasks will be processed (and do nothing). Ensure that you have removed them from your Periodic tasks table, and that they are no longer being invoked. You can then remove the method before your next deployment, and the issue should no recur.
This is still a workaround, and it would be preferable to be able to review and delete the tasks individually.