Checking the next run time for scheduled periodic tasks in Celery (with Django) - django

*Using celery 3.1.25 because django-celery-beat 1.0.1 has an issue with scheduling periodic tasks.
Recently I encountered an issue with celerybeat whereby periodic tasks with an interval of a day or longer appear to be 'forgotten' by the scheduler. If I change the interval to every 5 seconds the task executes normally (every 5 seconds) and the last_run_at attribute gets updated. This means celerybeat is responding to the scheduler to a certain degree, but if I reset the last_run_at i.e. PeriodicTask.objects.update(last_run_at=None), none of the tasks with an interval of every day run anymore.
Celerybeat crashed at one point and that may have corrupted something so I created a new virtualenv and database to see if the problem persists. I'd like to know if there is a way to retrieve the next run time so that I don't have to wait a day to know whether or not my periodic task has been executed.
I have also tried using inspect <active/scheduled/reserved> but all returned empty. Is this normal for periodic tasks using djcelery's database scheduler?
Here's the function that schedules the tasks:
def schedule_data_collection(request, project):
if (request.method == 'POST'):
interval = request.POST.get('interval')
target_project = Project.objects.get(url_path=project)
interval_schedule = dict(every=json.loads(interval), period='days')
schedule, created = IntervalSchedule.objects.get_or_create(
every=interval_schedule['every'],
period=interval_schedule['period'],
)
task_name = '{} data collection'.format(target_project.name)
try:
task = PeriodicTask.objects.get(name=task_name)
except PeriodicTask.DoesNotExist:
task = PeriodicTask.objects.create(
interval=schedule,
name=task_name,
task='myapp.tasks.collect_tool_data',
args=json.dumps([target_project.url_path])
)
else:
if task.interval != schedule:
task.interval = schedule
if task.enabled is False:
task.enabled = True
task.save()
return HttpResponse(task.interval)
else:
return HttpResponseForbidden()

You can see your scheduler by going into shell and looking at app.conf.CELERYBEAT_SCEDULE.
celery -A myApp shell
print(app.conf.CELERYBEAT_SCHEDULE)
This should show you all your Periodic Tasks.

Related

Celery - Django Schedule task from inside a scheduled task

I have a Task (let's call it MainTask) that is scheduled using apply_async method, this task has some validations that can trigger another task (SecondaryTask) to be scheduled with an eta.
Every time the MainTask tries to schedule the SecondaryTask using apply_async method, the SecondaryTask runs inmediatly, overriding the eta parameter.
How can I schedule a different task from a "Main Task" and to be executed later, using eta?
Here is an example of the code:
views.py
def function():
main_task.apply_async(eta=some_day)
tasks.py
#app.task(bind=True, name="main_task", autoretry_for=(Exception,), default_retry_delay=10, max_retries=3, queue='mail')
def main_task(self):
...
if something:
...
another_task.apply_async(eta=tomorrow)
#app.task(bind=True, name="another_task", autoretry_for=(Exception,), default_retry_delay=10, max_retries=3, queue='mail')
def another_task(self):
do_something()
I'm using Celery 4.4.6 btw
What celery version are you using?
I'm with 4.4.7 and using countdown which is kind of the same (maybe there's a bug with eta?):
countdown is a shortcut to set ETA by seconds into the future.
assuming that you are doing something similar to this:
tomorrow = datetime.utcnow() + timedelta(days=1)
you can always get the seconds and use with countdown:
seconds = timedelta(days=1).total_seconds()
another_task.apply_async(countdown=seconds)
EDIT:
can you try to apply_async on signature to see if that makes a difference?
another_task.si().apply_async(eta=tomorrow)

getting celery group results

So this is what i want to do. I have a scheduled task that runs every X minutes. In the task I create a group of tasks that i want them to run parallel to each other. After they all finish i want to log if the group has finished successfully or not. This is my code:
#shared_task(base=HandlersImplTenantTask, acks_late=True)
def my_scheduled_task():
try:
needed_ids = MyModel.objects.filter(some_field=False)\
.filter(some_other_field=True)\
.values_list("id", flat=True) \
.order_by("id")
if needed_ids:
tasks = [my_single_task.s(needed_id=id) for id in needed_ids]
job = group(tasks)
result = job.apply_async()
returned_values = result.get()
if result.ready():
if result.successful():
logger.info("SUCCESSFULLY FINISHED ALL THE SUBTASKS")
else:
returned_values = result.get()
logger.info("UNSUCCESSFULLY FINISHED ALL THE SUBTASKS WITH THE RESULTS %s" % returned_values)
else:
logger.info("no needed ids found")
except:
logger.exception("got an unexpected exception while running task")
This is my_single_task code:
#shared_task(base=HandlersImplTenantTask)
def my_single_task(needed_id):
logger.info("starting task for tenant: [%s]. got id [%s]", connection.tenant, needed_id)
return
This is how i run my celery:
manage.py celery worker -c 2 --broker=[my rabbitmq brocker url]
when i get to the line result.get() it hangs. i see a single log entry of the single tasks with the first id but i don't see the others. when i kill my celery process and restart it - it reruns the scheduled task and i see the second log entry with the second id (from the first time the task ran). any ideas on how to fix this?
EDIT - so to try and overcome this - I created a different queue called 'new_queue'. I started a different celery worker to listen to the new queue. I want to make the other worker take the tasks and work on them. I think this could solve the problem of the deadlock.
I have changed my code to look like this:
job = group(tasks)
job_result = job.apply_async(queue='new_queue')
results = job_result.get()
but I still get a deadlock and if i remove the results = job_result.get() line, i can see that the tasks are worked on by the main worker and nothing was published to the new_queue queue. Any thoughts?
This is my celery configurations:
tenant_celery_app.conf.update(CELERY_RESULT_BACKEND='djcelery.backends.database.DatabaseBackend'
CELERY_RESULT_DB_TABLENAMES = {
'task': 'tenantapp_taskmeta',
'group': 'tenantapp_groupmeta',
}
This is how i run the workers:
celery worker -c 1 -Q new_queue --broker=[amqp_brocker_url]/[vhost]
celery worker -c 1 --broker=[amqp_brocker_url]/[vhost]
So the solution i was looking for was indeed of the sort of creating a new queue and starting a new worker that processes the new queue. The only issue that i had was to send the group tasks to the new queue. This is the code that worked for me.
tasks = [my_single_task.s(needed_id=id).set(queue='new_queue') for id in needed_ids]
job = group(tasks)
job_result = job.apply_async()
results = job_result.get() # this will block until the tasks finish but it wont deadlock
And these are my celery workers
celery worker -c 1 -Q new_queue --broker=[amqp_brocker_url]/[vhost]
celery worker -c 1 --broker=[amqp_brocker_url]/[vhost]
you seem to be deadlocking your queue. Think about it. If you have a task that waits on other tasks, and the queue fills up then the first task will hang forever.
You need to refactor your code to avoid calling result.get() inside a task (you probably already have warnings in your logs about this)
I would recommend this:
#shared_task(base=HandlersImplTenantTask, acks_late=True)
def my_scheduled_task():
needed_ids = MyModel.objects.filter(some_field=False)\
.filter(some_other_field=True)\
.values_list("id", flat=True) \
.order_by("id")
if needed_ids:
tasks = [my_single_task.s(needed_id=id) for id in needed_ids]
job = group(tasks)
result = job.apply_async()
That's all you need.
Use logging to track if tasks fail.
If code elsewhere in your application needs to track whether the jobs fail or not then you can use celery's inspect api.

Running celery task when celery beat starts

How do I schedule a task to run when I start celery beat then again in 1 hours and so.
Currently I have schedule in settings.py:
CELERYBEAT_SCHEDULE = {
'update_database': {
'task': 'myapp.tasks.update_database',
'schedule': timedelta(seconds=60),
},
}
I saw a post from 1 year here on stackoverflow asking the same question:
How to run celery schedule instantly?
However this does not work for me, because my celery worker get 3-4 requests for the same task, when I run django server
I'm starting my worker and beat like this:
celery -A dashboard_web worker -B --loglevel=INFO --concurrency=10
Crontab schedule
You could try to use a crontab schedule instead which will run every hour and start 1 min after initialization of the scheduler. Warning: you might want to do it a couple of minutes later in case it takes longer to start, otherwise you might need to wait the full hour.
from celery.schedules import crontab
from datetime import datetime
CELERYBEAT_SCHEDULE = {
'update_database': {
'task': 'myapp.tasks.update_database',
'schedule': crontab(minute=(datetime.now().minute + 1) % 60),
},
}
Reference: http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#crontab-schedules
Ready method of MyAppConfig
In order to ensure that your task is run right away, you could use the same method as before to create the periodic task without adding 1 to the minute. Then, you call your task in the ready method of MyAppConfig which is called whenever your app is ready.
#myapp/apps.py
class MyAppConfig(AppConfig):
name = "myapp"
def ready(self):
from .tasks import update_database
update_database.delay()
Please note that you could also create the periodic task directly in the ready method if you were to use django_celery_beat.
Edit: Didn't see that the second method was already covered in the link you mentioned. I'll leave it here in case it is useful for someone else arriving here.
Try setting the configuration parameter CELERY_ALWAYS_EAGER = True
Something like this
app.conf.CELERY_ALWAYS_EAGER = True

How to flush tasks with countdown timers from celery queue

My Celery queue has hundreds of tasks with countdowns that will make them trigger over the next few hours. Is there a way to have these tasks run immediately such that the queue is effectively flushed?
I'm currently planning an upgrade to our server and I want to make sure that there are no background tasks running while the upgrade completes. If I have to wait for these countdowns, that's OK, but I'd rather force the tasks to run instead.
Another option could be to pause processing of the queue until the upgrade is complete, but flushing seems like a better option.
EDIT: I've figured out how to find a list of tasks that are scheduled:
from celery.task.control import inspect
i = inspect()
tasks = i.scheduled()
Now I just need to sort out how to force their execution.
OK, I'm fairly certain I've sorted out roughly how to do this. I'm making this answer a wiki and putting down my notes, in case anybody wants to tune up the general process here.
The general idea is this:
Stop adding new items to the queue.
Determine any tasks that are queued.
Revoke all those tasks using result.revoke().
Re-start those tasks using some saved state.
Note that this doesn't support adding an eta to the items once you re-queue them, as that's probably implementation-specific.
So, to figure out what tasks are queued, you do:
from celery.task.control import inspect
i = inspect()
scheduled_tasks = i.scheduled()
Which returns a dict, like so:
{u'w1.courtlistener.com': [{u'eta': 1414435210.198864,
u'priority': 6,
u'request': {u'acknowledged': False,
u'args': u'(2745724,)',
u'delivery_info': {u'exchange': u'celery',
u'priority': None,
u'routing_key': u'celery'},
u'hostname': u'w1.courtlistener.com',
u'id': u'99bc8650-3be1-4d24-81d6-a882d77a8b25',
u'kwargs': u'{}',
u'name': u'citations.tasks.update_document_by_id',
u'time_start': None,
u'worker_pid': None}}]}
The next step is to revoke all those tasks, with something like:
from celery.task.control import revoke
with open('revoked_tasks.csv', 'w') as f:
for worker, tasks in scheduled_tasks.iteritems():
print "Now processing worker: %s" % worker
for task in tasks:
print "Now revoking task: %s. %s with args: %s and kwargs: %s" % \
(task['request']['id'], task['request']['name'], task['request']['args'], task['request']['kwargs'])
f.write('%s|%s|%s|%s|%s\n' % (worker, task['request']['name'], task['request']['id'], task['request']['args'], task['request']['kwargs']))
revoke(task['request']['id'], terminate=True)
Then, finally, re-run the tasks as you would normally, loading them from your CSV file:
with open('revoked_tasks', 'r') as f:
for line in f:
worker, command, id, args, kwargs = line.split("|")
# Impost task here, something like...
package, module = command.rsplit('.', 1)
mod = __import__(package, globals(), locals(), [module])
# Run the commands, something like...
mod.__get_attribute__(module).delay(args*, kwargs**)

celery - Tasks that need to run in priority

In my website users can UPDATE they profile (manual) every time he want, or automatic once a day.
This task is being distributed with celery now.
But i have a "problem" :
Every day, in automatic update, a job put ALL users (+-6k users) on queue:
from celery import group
from tasks import *
import datetime
from lastActivityDate.models import UserActivity
today = datetime.datetime.today()
one_day = datetime.timedelta(days=5)
today -= one_day
print datetime.datetime.today()
user_list = UserActivity.objects.filter(last_activity_date__gte=today)
g = group(update_user_profile.s(i.user.auth.username) for i in user_list)
print datetime.datetime.today()
print g(user_list.count()).get()
If someone try to do the manual update, they will enter on te queue and last forever to be executed.
Is there a way to set this manual task to run in a piority way?
Or make a dedicated for each separated queue: manual and automatic?
Celery does not support task priority. (v3.0)
http://docs.celeryproject.org/en/master/faq.html#does-celery-support-task-priorities
You may solve this problem by routing tasks.
http://docs.celeryproject.org/en/latest/userguide/routing.html
Prepare default and priority_high Queue.
from kombu import Queue
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default'),
Queue('priority_high'),
)
Run two daemon.
user#x:/$ celery worker -Q priority_high
user#y:/$ celery worker -Q default,priority_high
And route task.
your_task.apply_async(args=['...'], queue='priority_high')
If you use RabbitMQ transport then configure your queues the following way:
settings.py
from kombu import Queue
...
CELERY_TASK_QUEUES = (
Queue('default', routing_key='task_default.#', max_priority=10),
...)
Then run your tasks:
my_low_prio_task.apply_async(args=(...), priority=1)
my_high_prio_task.apply_async(args=(...), priority=10)
Presently this code works for kombu==4.6.11, celery==4.4.6.