Restarting celery and celery beat schedule relationship in django - django

Will restarting celery cause all the periodic tasks(celery beat schedules) to get reset and start from the time celery is restarted or does it retain the schedule?
For example assume I have a periodic task that gets executed at 12 pm everyday. Now I restart celery at 3 pm. Will the periodic task be reset to run at 3 pm everyday?

How do you set your task?
Here is many ways to set task schedule →
Example: Run the tasks.add task every 30 seconds.
app.conf.beat_schedule = {
'add-every-30-seconds': {
'task': 'tasks.add',
'schedule': 30.0,
'args': (16, 16)
},
}
app.conf.timezone = 'UTC'
This task is running every 30 seconds after start.
Another example:
from celery.schedules import crontab
app.conf.beat_schedule = {
# Executes every Monday morning at 7:30 a.m.
'add-every-monday-morning': {
'task': 'tasks.add',
'schedule': crontab(hour=7, minute=30),
'args': (16, 16),
},
}
This task is running at 7:30 every day.
You may check schedule examples
So answer is depending on your code.

Related

Timedeltasensor delaying from schedule interval

I have a job which runs at 13:30. Of which first task takes almost 1 hour to complete after that we need to wait 15 mins. So, I am using Timedeltasensor like below.
waitfor15min = TimeDeltaSensor(
task_id='waitfor15min',
delta=timedelta(minutes=15),
dag=dag)
However in logs, It is showing schedule_interval + 15 min like below
[2020-11-05 20:36:27,013] {time_delta_sensor.py:45} INFO - Checking if the time (2020-11-05T13:45:00+00:00) has come
[2020-11-05 20:36:27,013] {base_sensor_operator.py:79} INFO - Success criteria met. Exiting.
[2020-11-05 20:36:30,655] {logging_mixin.py:95} INFO - [2020-11-05 20:36:30,655] {jobs.py:2612} INFO - Task exited with return code 0
How can I create delay between job??
You could use PythonOperator and write a function that simply waits 15 minutes. There is an example on how a wait task could look like:
def my_sleeping_function(random_base, **kwargs)):
"""This is a function that will run within the DAG execution"""
time.sleep(random_base)
# Generate 5 sleeping tasks, sleeping from 0.0 to 0.4 seconds respectively
for i in range(5):
task = PythonOperator(
task_id='sleep_for_' + str(i),
python_callable=my_sleeping_function,
op_kwargs={'random_base': float(i) / 10},
provide context=true,
dag=dag,
)
run_this >> task

Use Celery periodic tasks output in Django views set up with django_celery_beat and Cachine with Redis

I am trying to use Celery to perform a rather consuming algorithm on one of my models.
Currently in my home.tasks.py I have:
#shared_task(bind=True)
def get_hot_posts():
return Post.objects.get_hot()
#shared_task(bind=True)
def get_top_posts():
pass
Which inside my Post object model manager I have:
def get_hot(self):
qs = (
self.get_queryset()
.select_related("author")
)
qs_list = list(qs)
sorted_post = sorted(qs_list, key=lambda p: p.hot(), reverse=True)
return sorted_post
Which returns a list object of the hot posts.
I have used django_celery_beat in order to set a periodic task. Which I have configured in my settings.py
CELERY_BEAT_SCHEDULE = {
'update-hot-posts': {
'task':'get_hot_posts',
'schedule': 3600.0
},
'update-top-posts': {
'task':'get_top_posts',
'schedule': 86400
}
}
I do not if I can perform any functions on my models in Celery tasks, but my intention is to compute the top posts every 1 hour, and then simply use it in one of my views. How can I achieve this, I am not able to find how I can get the output of that task and use it in my views in order to render it in my template.
Thanks in advance!
EDIT
I am now caching the results:
settings.py:
CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": "redis://127.0.0.1:6379/1",
"OPTIONS": {
"CLIENT_CLASS": "django_redis.client.DefaultClient",
"IGNORE_EXCEPTIONS": True,
}
}
}
CACHE_TTL = getattr(settings, 'CACHE_TTL', DEFAULT_TIMEOUT)
#shared_task(bind=True)
def get_hot_posts():
hot_posts = Post.objects.get_hot()
cache.set("hot_posts", hot_posts, timeout=CACHE_TTL)
However, when accessing objects in my view it return None, it seems my tasks are not working.
#login_required
def hot_posts(request):
posts = cache.get("hot_posts")
context = { 'posts':posts, 'hot_active':'-active'}
return render(request, 'home/homepage/home.html', context)
How can I check whether my tasks are running properly or not? And it is actually working and caching the queryset function.
EDIT: Configuration in settings.py:
BROKER_URL = 'redis://localhost:6379'
BROKER_TRANSPORT = 'redis'
CELERY_RESULT_BACKEND = 'redis://localhost:6379'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_BEAT_SCHEDULE = {
'update-hot-posts': {
'task':'get_hot_posts',
'schedule': 3600.0
},
'update-top-posts': {
'task':'get_top_posts',
'schedule': 86400.0
},
'tester': {
'task':'tester',
'schedule': 60.0
}
}
I do not see and results when I go to my view andcache.get returns None, I think my tasks are not running but I cannot find the reason.
This is what happens when I run my worker:
celery -A register worker -E --loglevel=info
-------------- celery#apples-MacBook-Pro-2.local v4.4.6 (cliffs)
--- ***** -----
-- ******* ---- Darwin-16.7.0-x86_64-i386-64bit 2020-07-06 01:46:36
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: register:0x10f3da050
- ** ---------- .> transport: redis://localhost:6379//
- ** ---------- .> results: redis://localhost:6379/
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: ON
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. home.tasks.get_hot_posts
. home.tasks.get_top_posts
. home.tasks.tester
[2020-07-06 01:46:38,449: INFO/MainProcess] Connected to redis://localhost:6379//
[2020-07-06 01:46:38,500: INFO/MainProcess] mingle: searching for neighbors
[2020-07-06 01:46:39,592: INFO/MainProcess] mingle: all alone
[2020-07-06 01:46:39,650: INFO/MainProcess] celery#apples-MacBook-Pro-2.local ready.
Also for starting up beat I use:
celery -A register beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler
My suggestion is that you alter your model and make it taggable. Perhaps this: https://django-taggit.readthedocs.io/
Once you've done that you can modify your celery job that calculates hot posts. Once the new hot posts are calculated you can remove all the "hot" tags from all existing posts and then tag the newly-hot posts with the "hot" tag.
Then your view code can simply filter for posts with the hot tag.
EDIT
If you want to be sure that your code is actually executing there are extensions that you can use to do so. For example the django-celery-results backend will store whatever data your #shared_task returns (usually JSON if that's your message encoding) in the database along with a timestamp and maybe even the input args. So then you can see if/that your tasks are running as desired.
https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html#django-celery-results-using-the-django-orm-cache-as-a-result-backend
You might also consider django-celery-beat to ensure that you have a nice visual way to see job schedules via the django admin
https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html#django-celery-beat-database-backed-periodic-tasks-with-admin-interface
EDIT 2
If you're going to use the database scheduler (highly recommended!) then you'll need to login to the admin and add your tasks on the schedule that you want.
https://pinoylearnpython.com/wp-content/uploads/2019/04/Django-Celery-Beat-on-Admin-Site-Pinoy-Learn-Python-1024x718.jpg
EDIT 3
In your settings.py
CELERY_BEAT_SCHEDULE = {
'update-hot-posts': {
'task':'get_hot_posts',
'schedule': 3600.0
},
'update-top-posts': {
'task':'get_top_posts',
'schedule': 86400.0
},
'tester': {
'task':'tester',
'schedule': 60.0
}
}
The third task there is called tester which is supposed to run every 60s. I don't see that anywhere in your tasks. Because you have attempted to schedule a task which isn't defined anywhere as a #shared_task celery is getting confused and giving you the error messages about tester.

Celery beat runs every minute instead of every 15 minutes

I'm setting my celery to have the schedule
CELERYBEAT_SCHEDULE = {
'update_some_info': {
'task': 'myapp.somepath.update_some_info',
'schedule': crontab(minute='*/15'),
},
}
when checking what's actually written in crontab, it's indeed <crontab: */15 * * * * (m/h/d/dM/MY)>
but my celery log indicates that the task is running every minute
INFO 2020-01-06 13:21:00,004 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
INFO 2020-01-06 13:22:00,003 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
INFO 2020-01-06 13:23:00,004 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
INFO 2020-01-06 13:24:28,255 beat 29534 139876219189056 Scheduler: Sending due task update_some_info (myapp.somepath.update_some_info)
Why isn't celery beat picking up my schedule?

Celery schedule

I have some schedule problem in Celery.
The task works, however I want it to run once on Monday, but it runs every minute.
My schedule config:
CELERY_BEAT_SCHEDULE = {
'kek': {
'task': 'kek',
'schedule': crontab(day_of_week=1),
}
}
Welcome to SO, #Sturm.
Just define hour and minute:
# Executes every Monday morning at 8:30 a.m.
crontab(hour=8, minute=30, day_of_week=1)#Monday is 1
This is happening because the default for crontab is to execute the task every minute.
For further information, just check the documentation for crontab.

How to avoid duplication of task execution in celery? And how to assign an worker to an default queue

My task executes more than once with two minutes. but scheduled for once in celery_beat queue.
Tried with restarting the celery with supervisorctl command as supervisorctl stop all and supervisorctl stop all
app.autodiscover_tasks(settings.INSTALLED_APPS, related_name='tasks')
app.conf.task_default_queue = 'default'
app.conf.task_routes = {'cloudtemplates.tasks.get_metrics': {'queue': 'metrics'}}
app.conf.beat_schedule = {
'load-softlyer-machine-images': {
'task': 'load_ibm_machine_images',
'schedule': crontab(0, 0, day_of_month='13'),
'args': '',
'options': {'queue': 'celery_beat'},
}
}
Expected to run the sheduled task only once for on 13 th of every month.