celery - Tasks that need to run in priority - django

In my website users can UPDATE they profile (manual) every time he want, or automatic once a day.
This task is being distributed with celery now.
But i have a "problem" :
Every day, in automatic update, a job put ALL users (+-6k users) on queue:
from celery import group
from tasks import *
import datetime
from lastActivityDate.models import UserActivity
today = datetime.datetime.today()
one_day = datetime.timedelta(days=5)
today -= one_day
print datetime.datetime.today()
user_list = UserActivity.objects.filter(last_activity_date__gte=today)
g = group(update_user_profile.s(i.user.auth.username) for i in user_list)
print datetime.datetime.today()
print g(user_list.count()).get()
If someone try to do the manual update, they will enter on te queue and last forever to be executed.
Is there a way to set this manual task to run in a piority way?
Or make a dedicated for each separated queue: manual and automatic?

Celery does not support task priority. (v3.0)
http://docs.celeryproject.org/en/master/faq.html#does-celery-support-task-priorities
You may solve this problem by routing tasks.
http://docs.celeryproject.org/en/latest/userguide/routing.html
Prepare default and priority_high Queue.
from kombu import Queue
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default'),
Queue('priority_high'),
)
Run two daemon.
user#x:/$ celery worker -Q priority_high
user#y:/$ celery worker -Q default,priority_high
And route task.
your_task.apply_async(args=['...'], queue='priority_high')

If you use RabbitMQ transport then configure your queues the following way:
settings.py
from kombu import Queue
...
CELERY_TASK_QUEUES = (
Queue('default', routing_key='task_default.#', max_priority=10),
...)
Then run your tasks:
my_low_prio_task.apply_async(args=(...), priority=1)
my_high_prio_task.apply_async(args=(...), priority=10)
Presently this code works for kombu==4.6.11, celery==4.4.6.

Related

celery + redis + flask how to get the quantity of jobs await to execute by celery

I use docker to run celery + redis + flask, and I want to know how many tasks is waiting to be executed by celery. I tried to find the information in redis with command
keys *
and I get result:
127.0.0.1:6379> keys *
1) "unacked_mutex"
2) "_kombu.binding.celeryev"
3) "unacked_index"
4) "_kombu.binding.celery.pidbox"
5) "_kombu.binding.celery"
6) "unacked"
Non of these items seem to contain the celery queue information. How can I read the celery queue size?
This is the celery code:
from celery import Celery
import time
app = Celery('tasks', broker='redis://redis:6379')
#app.task
def sleeptest():
time.sleep(100)
This is how I submit the celery job:
import tasks
import time
tasks.sleeptest.delay()
time.sleep(1)
tasks.sleeptest.delay()
time.sleep(1)
tasks.sleeptest.delay()
time.sleep(1)
tasks.sleeptest.delay()
time.sleep(1)
tasks.sleeptest.delay()
time.sleep(1)
When I post 100 tasks, the celery queue occurs. But when I post only 5 tasks, the celery queue donnot show, even though I set concurrency to 1 and 4 tasks is actually awaiting.
The information is buried deep in the Celery documentation - monitoring redis queues. If you see nothing, that means that the tasks are either finished, running or already reserved (look for details about the worker_prefetch_multiplier).

Checking the next run time for scheduled periodic tasks in Celery (with Django)

*Using celery 3.1.25 because django-celery-beat 1.0.1 has an issue with scheduling periodic tasks.
Recently I encountered an issue with celerybeat whereby periodic tasks with an interval of a day or longer appear to be 'forgotten' by the scheduler. If I change the interval to every 5 seconds the task executes normally (every 5 seconds) and the last_run_at attribute gets updated. This means celerybeat is responding to the scheduler to a certain degree, but if I reset the last_run_at i.e. PeriodicTask.objects.update(last_run_at=None), none of the tasks with an interval of every day run anymore.
Celerybeat crashed at one point and that may have corrupted something so I created a new virtualenv and database to see if the problem persists. I'd like to know if there is a way to retrieve the next run time so that I don't have to wait a day to know whether or not my periodic task has been executed.
I have also tried using inspect <active/scheduled/reserved> but all returned empty. Is this normal for periodic tasks using djcelery's database scheduler?
Here's the function that schedules the tasks:
def schedule_data_collection(request, project):
if (request.method == 'POST'):
interval = request.POST.get('interval')
target_project = Project.objects.get(url_path=project)
interval_schedule = dict(every=json.loads(interval), period='days')
schedule, created = IntervalSchedule.objects.get_or_create(
every=interval_schedule['every'],
period=interval_schedule['period'],
)
task_name = '{} data collection'.format(target_project.name)
try:
task = PeriodicTask.objects.get(name=task_name)
except PeriodicTask.DoesNotExist:
task = PeriodicTask.objects.create(
interval=schedule,
name=task_name,
task='myapp.tasks.collect_tool_data',
args=json.dumps([target_project.url_path])
)
else:
if task.interval != schedule:
task.interval = schedule
if task.enabled is False:
task.enabled = True
task.save()
return HttpResponse(task.interval)
else:
return HttpResponseForbidden()
You can see your scheduler by going into shell and looking at app.conf.CELERYBEAT_SCEDULE.
celery -A myApp shell
print(app.conf.CELERYBEAT_SCHEDULE)
This should show you all your Periodic Tasks.

Update database fields hourly with Python/Django

Suppose I have 1000 user_ids in a table and I would run every hour to get from Google API info and update 3 fields in that table. How would the impact be and how can it be done efficiently?
I've seen this variant:
m = Module.objects.get(user_id=1).update(field_one=100, field_two=200, field_three=300)
And this one:
m = Module.objects.get(user_id=1)
m.field_one = 100
m.field_two = 200
m.field_three = 300
m.save()
Also how can it be done so that it will run every hour and grab that information? Never done something like this.
Use Redis, Celery to setup asynchronous task queue every hour. Look here https://realpython.com/blog/python/asynchronous-tasks-with-django-and-celery/ for more info on how to setup asych task queue system for django.
Here is the code for tasks.py
from celery.task import periodic_task
from celery.schedules import crontab
#periodic_task(run_every=crontab(minute=0, hour='*/1'))
def get_data_from_google_api():
data_from_google =ping_google_api() # ping google api to get data
return Module.objects.get(user_id=1).update(field_one= data_from_google['field_one'], field_two= data_from_google['field_two'], field_three= data_from_google['field_three'])
Look here for more info :
https://www.caktusgroup.com/blog/2014/06/23/scheduling-tasks-celery/
How to run a Django celery task every 6am and 6pm daily?
Fof this purpose you need to run background queries with periodic taks.
Here is most popular in django task-queue-libs
For example, if you decide use celery, you can write simple periodic task:
from celery.schedules import crontab
from celery.task import periodic_task
#periodic_task(
name='UPDATE_USER',
run_every=crontab(
minute='1',
hour='1,4,7,10,13,16,19,22'))
def update_user():
#get some value from api
Module.objects.filter(user_id=1).update(
field_one=value, field_two=value, field_three=value)
All settings for django you can look in celery docs

Running celery task when celery beat starts

How do I schedule a task to run when I start celery beat then again in 1 hours and so.
Currently I have schedule in settings.py:
CELERYBEAT_SCHEDULE = {
'update_database': {
'task': 'myapp.tasks.update_database',
'schedule': timedelta(seconds=60),
},
}
I saw a post from 1 year here on stackoverflow asking the same question:
How to run celery schedule instantly?
However this does not work for me, because my celery worker get 3-4 requests for the same task, when I run django server
I'm starting my worker and beat like this:
celery -A dashboard_web worker -B --loglevel=INFO --concurrency=10
Crontab schedule
You could try to use a crontab schedule instead which will run every hour and start 1 min after initialization of the scheduler. Warning: you might want to do it a couple of minutes later in case it takes longer to start, otherwise you might need to wait the full hour.
from celery.schedules import crontab
from datetime import datetime
CELERYBEAT_SCHEDULE = {
'update_database': {
'task': 'myapp.tasks.update_database',
'schedule': crontab(minute=(datetime.now().minute + 1) % 60),
},
}
Reference: http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#crontab-schedules
Ready method of MyAppConfig
In order to ensure that your task is run right away, you could use the same method as before to create the periodic task without adding 1 to the minute. Then, you call your task in the ready method of MyAppConfig which is called whenever your app is ready.
#myapp/apps.py
class MyAppConfig(AppConfig):
name = "myapp"
def ready(self):
from .tasks import update_database
update_database.delay()
Please note that you could also create the periodic task directly in the ready method if you were to use django_celery_beat.
Edit: Didn't see that the second method was already covered in the link you mentioned. I'll leave it here in case it is useful for someone else arriving here.
Try setting the configuration parameter CELERY_ALWAYS_EAGER = True
Something like this
app.conf.CELERY_ALWAYS_EAGER = True

Celery PeriodicTask won't expire

I'm trying to setup a Periodic Task that should expire after some time. I'm using Django 1.5.1, celery 3.0.19 and django-celery 3.0.17 (everything from pip).
This is the excerpt code to create the task:
from django.utils import timezone
from datetime import timedelta, datetime
from djcelery.models import PeriodicTask, IntervalSchedule
interval = IntervalSchedule.objects.get(pk=1) # Added through fixture - 3sec interval
expiration = timezone.now() + timedelta(seconds=10)
task = PeriodicTask(name='fill_%d' % profile.id,
task='fill_album',
args=[instance.id],
interval=interval,
expires=expiration) task.save()
And I'm running celery with ./manage.py celeryd -B
The task is being created just fine, and beat is running it every 3 seconds, but after 10 seconds it doesn't expire. At first I thought it was some timezone issue between django and celery, so I let it running for 3 hours (my difference to UTC) but it still wouldn't expire.
During my tests I've actually managed to make it expire once (and the logger kept repeating it was expired, every 3 seconds) but I haven't been able to reproduce it since.
Can anyone shed some light on what I'm doing wrong?
Thanks!
I'm having the same problem and I think celery beat is not honoring the expires. If you set a breakpoint in your task take a look at the current_task.request object and see if expires has a value (or just print current_task.request from within the task.)
For me, if I manually run the task, current_task.request.expires has a value, but if celery beat schedules it, it is None.
I'm using celery 3.1.11
I filed a bug: https://github.com/celery/celery/issues/2283
You can try use last_run_at as:
task = PeriodicTask(name='fill_%d' % profile.id,
task='fill_album',
args=[instance.id],
interval=interval,
expires=expiration,
last_run_at=expiration)
task.save()