I'd like to create a periodic task for celery using django-celery's admin interface. I have a task set up which runs great when called manually or by script. It just doesn't work through celerybeat. According to the debug logs the task is set to enabled = False on first retrieval and I wonder why.
When adding the periodic task and passing [1, False] as positional arguments, the task is automatically disabled and I don't see any further output. When added without arguments the task is executed but raises an exception instantly because I didn't supply the needed arguments (makes sense).
Does anyone see what's the problem here?
Thanks in advance.
This is the output after supplying arguments:
[DEBUG/Beat] SELECT "djcelery_periodictask"."id", [...]
FROM "djcelery_periodictask"
WHERE "djcelery_periodictask"."enabled" = true ; args=(True,)
[DEBUG/Beat] SELECT "djcelery_intervalschedule"."id", [...]
FROM "djcelery_intervalschedule"
WHERE "djcelery_intervalschedule"."id" = 3 ; args=(3,)
[DEBUG/Beat] SELECT (1) AS "a"
FROM "djcelery_periodictask"
WHERE "djcelery_periodictask"."id" = 3 LIMIT 1; args=(3,)
[DEBUG/Beat] UPDATE "djcelery_periodictask"
SET "name" = E'<taskname>', "task" = E'<task.module.path>',
"interval_id" = 3, "crontab_id" = NULL,
"args" = E'[1, False,]', "kwargs" = E'{}', "queue" = NULL,
"exchange" = NULL, "routing_key" = NULL,
"expires" = NULL, "enabled" = false,
"last_run_at" = E'2011-05-25 00:45:23.242387', "total_run_count" = 9,
"date_changed" = E'2011-05-25 09:28:06.201148'
WHERE "djcelery_periodictask"."id" = 3;
args=(
u'<periodic-task-name>', u'<task.module.path>',
3, u'[1, False,]', u'{}',
False, u'2011-05-25 00:45:23.242387', 9,
u'2011-05-25 09:28:06.201148', 3
)
[DEBUG/Beat] Current schedule:
<ModelEntry: celery.backend_cleanup celery.backend_cleanup(*[], **{}) {<crontab: 0 4 * (m/h/d)>}
[DEBUG/Beat] Celerybeat: Waking up in 5.00 seconds.
EDIT:
It works with the following setting. I still have no idea why it doesn't work with django-celery.
CELERYBEAT_SCHEDULE = {
"example": {
"task": "<task.module.path>",
"schedule": crontab(),
"args": (1, False)
},
}
I had the same issue. Make sure the arguments are JSON formatted. For example, try setting the positional args to [1, false] -- lowercase 'false' -- I just tested it on a django-celery instance (version 2.2.4) and it worked.
For the keyword args, use something like {"name": "aldarund"}
I got the same problem too.
With the description of PeriodicTask models in djcelery ("JSON encoded positional arguments"), same as Evan answer. I try using python json lib to encode before save.
And this work with me
import json
o = PeriodicTask()
o.kwargs = json.dumps({'myargs': 'hello'})
o.save()
celery version 3.0.11
CELERYBEAT_SCHEDULE = {
"example": {
"task": "<task.module.path>",
"schedule": crontab(),
"enable": False
},
}
I tried and it worked.I run on celery beat v5.1.2
Related
I'm trying to create EMR cluster with Autotermination Idle timeout setting using Airflow DAG.
It doesn't accept Autoterminationpolicy parameter and fails with parameter validation with following error:
raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in input: "AutoTerminationPolicy", must be one of: Name, LogUri, LogEncryptionKmsKeyId, AdditionalInfo, AmiVersion, ReleaseLabel, Instances, Steps, BootstrapActions, SupportedProducts, NewSupportedProducts, Applications, Configurations, VisibleToAllUsers, JobFlowRole, ServiceRole, Tags, SecurityConfiguration, AutoScalingRole, ScaleDownBehavior, CustomAmiId, EbsRootVolumeSize, RepoUpgradeOnBoot, KerberosAttributes, StepConcurrencyLevel, ManagedScalingPolicy, PlacementGroupConfigs
JOB_FLOW_OVERRIDES = {
'Name': 'X',
'ReleaseLabel': "{{ReleaseLabel}}",
"Applications": [{"Name": "Hadoop"}, {"Name": "Spark"}, {"Name": "Hive"}],
"AutoTerminationPolicy": {
"IdleTimeout": 60 * 10
},
'LogUri': LogUri,
'Instances': {
'Ec2SubnetId': "{{Ec2SubnetId}}",
'EmrManagedMasterSecurityGroup': "{{EmrManagedMasterSecurityGroup}}",
'ServiceAccessSecurityGroup': "{{ServiceAccessSecurityGroup}}",
'EmrManagedSlaveSecurityGroup': "{{EmrManagedSlaveSecurityGroup}}",
'InstanceGroups': [
I have a django Q cluster running with this configuration:
Q_CLUSTER = {
'name': 'pretty_name',
'workers': 1,
'recycle': 500,
'timeout': 500,
'queue_limit': 5,
'cpu_affinity': 1,
'label': 'Django Q',
'save_limit': 0,
'ack_failures': True,
'max_attempts': 1,
'attempt_count': 1,
'redis': {
'host': CHANNEL_REDIS_HOST,
'port': CHANNEL_REDIS_PORT,
'db': 5,
}
}
On this cluster I have a scheduled task supposed to run every 15 minutes.
Sometimes it works fine and this is what I can see on my worker logs:
[Q] INFO Enqueued 1
[Q] INFO Process-1 created a task from schedule [2]
[Q] INFO Process-1:1 processing [oranges-georgia-snake-social]
[ My Personal Custom Task Log]
[Q] INFO Processed [oranges-georgia-snake-social]
But other times the task does not start, this is what I get on my log:
[Q] INFO Enqueued 1
[Q] INFO Process-1 created a task from schedule [2]
And then nothing for the next 15 minutes.
Any idea where this might come from ?
So this was my prod environment and it appears that my dev environment was using the same redis db and even though no task existed on my dev environment it seems that this was the cause of the issue.
The solution was to change the redis db between my dev and prod environment !
Will restarting celery cause all the periodic tasks(celery beat schedules) to get reset and start from the time celery is restarted or does it retain the schedule?
For example assume I have a periodic task that gets executed at 12 pm everyday. Now I restart celery at 3 pm. Will the periodic task be reset to run at 3 pm everyday?
How do you set your task?
Here is many ways to set task schedule →
Example: Run the tasks.add task every 30 seconds.
app.conf.beat_schedule = {
'add-every-30-seconds': {
'task': 'tasks.add',
'schedule': 30.0,
'args': (16, 16)
},
}
app.conf.timezone = 'UTC'
This task is running every 30 seconds after start.
Another example:
from celery.schedules import crontab
app.conf.beat_schedule = {
# Executes every Monday morning at 7:30 a.m.
'add-every-monday-morning': {
'task': 'tasks.add',
'schedule': crontab(hour=7, minute=30),
'args': (16, 16),
},
}
This task is running at 7:30 every day.
You may check schedule examples
So answer is depending on your code.
I am trying to use Celery to perform a rather consuming algorithm on one of my models.
Currently in my home.tasks.py I have:
#shared_task(bind=True)
def get_hot_posts():
return Post.objects.get_hot()
#shared_task(bind=True)
def get_top_posts():
pass
Which inside my Post object model manager I have:
def get_hot(self):
qs = (
self.get_queryset()
.select_related("author")
)
qs_list = list(qs)
sorted_post = sorted(qs_list, key=lambda p: p.hot(), reverse=True)
return sorted_post
Which returns a list object of the hot posts.
I have used django_celery_beat in order to set a periodic task. Which I have configured in my settings.py
CELERY_BEAT_SCHEDULE = {
'update-hot-posts': {
'task':'get_hot_posts',
'schedule': 3600.0
},
'update-top-posts': {
'task':'get_top_posts',
'schedule': 86400
}
}
I do not if I can perform any functions on my models in Celery tasks, but my intention is to compute the top posts every 1 hour, and then simply use it in one of my views. How can I achieve this, I am not able to find how I can get the output of that task and use it in my views in order to render it in my template.
Thanks in advance!
EDIT
I am now caching the results:
settings.py:
CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": "redis://127.0.0.1:6379/1",
"OPTIONS": {
"CLIENT_CLASS": "django_redis.client.DefaultClient",
"IGNORE_EXCEPTIONS": True,
}
}
}
CACHE_TTL = getattr(settings, 'CACHE_TTL', DEFAULT_TIMEOUT)
#shared_task(bind=True)
def get_hot_posts():
hot_posts = Post.objects.get_hot()
cache.set("hot_posts", hot_posts, timeout=CACHE_TTL)
However, when accessing objects in my view it return None, it seems my tasks are not working.
#login_required
def hot_posts(request):
posts = cache.get("hot_posts")
context = { 'posts':posts, 'hot_active':'-active'}
return render(request, 'home/homepage/home.html', context)
How can I check whether my tasks are running properly or not? And it is actually working and caching the queryset function.
EDIT: Configuration in settings.py:
BROKER_URL = 'redis://localhost:6379'
BROKER_TRANSPORT = 'redis'
CELERY_RESULT_BACKEND = 'redis://localhost:6379'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_BEAT_SCHEDULE = {
'update-hot-posts': {
'task':'get_hot_posts',
'schedule': 3600.0
},
'update-top-posts': {
'task':'get_top_posts',
'schedule': 86400.0
},
'tester': {
'task':'tester',
'schedule': 60.0
}
}
I do not see and results when I go to my view andcache.get returns None, I think my tasks are not running but I cannot find the reason.
This is what happens when I run my worker:
celery -A register worker -E --loglevel=info
-------------- celery#apples-MacBook-Pro-2.local v4.4.6 (cliffs)
--- ***** -----
-- ******* ---- Darwin-16.7.0-x86_64-i386-64bit 2020-07-06 01:46:36
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: register:0x10f3da050
- ** ---------- .> transport: redis://localhost:6379//
- ** ---------- .> results: redis://localhost:6379/
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: ON
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. home.tasks.get_hot_posts
. home.tasks.get_top_posts
. home.tasks.tester
[2020-07-06 01:46:38,449: INFO/MainProcess] Connected to redis://localhost:6379//
[2020-07-06 01:46:38,500: INFO/MainProcess] mingle: searching for neighbors
[2020-07-06 01:46:39,592: INFO/MainProcess] mingle: all alone
[2020-07-06 01:46:39,650: INFO/MainProcess] celery#apples-MacBook-Pro-2.local ready.
Also for starting up beat I use:
celery -A register beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler
My suggestion is that you alter your model and make it taggable. Perhaps this: https://django-taggit.readthedocs.io/
Once you've done that you can modify your celery job that calculates hot posts. Once the new hot posts are calculated you can remove all the "hot" tags from all existing posts and then tag the newly-hot posts with the "hot" tag.
Then your view code can simply filter for posts with the hot tag.
EDIT
If you want to be sure that your code is actually executing there are extensions that you can use to do so. For example the django-celery-results backend will store whatever data your #shared_task returns (usually JSON if that's your message encoding) in the database along with a timestamp and maybe even the input args. So then you can see if/that your tasks are running as desired.
https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html#django-celery-results-using-the-django-orm-cache-as-a-result-backend
You might also consider django-celery-beat to ensure that you have a nice visual way to see job schedules via the django admin
https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html#django-celery-beat-database-backed-periodic-tasks-with-admin-interface
EDIT 2
If you're going to use the database scheduler (highly recommended!) then you'll need to login to the admin and add your tasks on the schedule that you want.
https://pinoylearnpython.com/wp-content/uploads/2019/04/Django-Celery-Beat-on-Admin-Site-Pinoy-Learn-Python-1024x718.jpg
EDIT 3
In your settings.py
CELERY_BEAT_SCHEDULE = {
'update-hot-posts': {
'task':'get_hot_posts',
'schedule': 3600.0
},
'update-top-posts': {
'task':'get_top_posts',
'schedule': 86400.0
},
'tester': {
'task':'tester',
'schedule': 60.0
}
}
The third task there is called tester which is supposed to run every 60s. I don't see that anywhere in your tasks. Because you have attempted to schedule a task which isn't defined anywhere as a #shared_task celery is getting confused and giving you the error messages about tester.
I started migrating my code to boto 3 and one nice addition I noticed are the waiters.
I want to create a snapshot from a db instance and I want to check for it's availability before I resume with my code.
My approach is the following:
# Notice: Step : Check snapshot availability [1st account - Oregon]
print "--- Check snapshot availability [1st account - Oregon] ---"
new_snap = client1.describe_db_snapshots(DBSnapshotIdentifier=new_snapshot_name)['DBSnapshots'][0]
# print pprint.pprint(new_snap) #debug
waiter = client1.get_waiter('db_snapshot_completed')
print "Manual snapshot is -pending-"
sleep(60)
waiter.wait(
DBSnapshotIdentifier = new_snapshot_name,
IncludeShared = True,
IncludePublic = False
)
print "OK. Manual snapshot is -available-"
,but the documentation says that it polls the status every 15 seconds for 40 times. That is 10 minutes. Yet, a rather big DB will need more than that .
How could I use the waiter to alleviate for that?
Waiters have configuration parameters'delay' and 'max_attempts'
like this :
waiter = rds_client.get_waiter('db_instance_available')
print( "waiter delay: " + str(waiter.config.delay) )
waiter.py on github
You could do it without the waiter if you like.
From the documentation for that waiter:
Polls RDS.Client.describe_db_snapshots() every 15 seconds until a successful state is reached. An error is returned after 40 failed checks.
Basically that means it does the following:
RDS = boto3.client('rds')
RDS.describe_db_snapshots()
You can just run that but filter to your snapshot id, here is the syntax.http://boto3.readthedocs.io/en/latest/reference/services/rds.html#RDS.Client.describe_db_snapshots
response = client.describe_db_snapshots(
DBInstanceIdentifier='string',
DBSnapshotIdentifier='string',
SnapshotType='string',
Filters=[
{
'Name': 'string',
'Values': [
'string',
]
},
],
MaxRecords=123,
Marker='string',
IncludeShared=True|False,
IncludePublic=True|False
)
This will end up looking something like this:
snapshot_description = RDS.describe_db_snapshots(DBSnapshotIdentifier='YOURIDHERE')
then you can just loop until that returns a snapshot which is available. So here is a very rough idea.
import boto3
import time
RDS = boto3.client('rds')
RDS.describe_db_snapshots()
snapshot_description = RDS.describe_db_snapshots(DBSnapshotIdentifier='YOURIDHERE')
while snapshot_description['DBSnapshots'][0]['Status'] != 'available' :
print("still waiting")
time.sleep(15)
snapshot_description = RDS.describe_db_snapshots(DBSnapshotIdentifier='YOURIDHERE')
I think the other answer alluded to this solution but here it is expressly.
[snip]
...
# Create your waiter
waiter_db_snapshot = client1.get_waiter('db_snapshot_completed')
# Increase the max number of tries as appropriate
waiter_db_snapshot.config.max_attempts = 120
# Add a 60 second delay between attempts
waiter_db_snapshot.config.delay = 60
print "Manual snapshot is -pending-"
....
[snip]