I'm using django-celery-beat in a django app (this stores the schedule in the database instead of a local file). I've configured my schedule via celery_beat that Celery is initialized with via app.config_from_object(...)
I recently renamed/removed a few tasks and restarted the app. The new tasks showed up, but the tasks removed from the celery_beat dictionary didn't get removed from the database.
Is this expected workflow -- requiring manual removal of tasks from the database? Is there a workaround to automatically reconcile the schedule at Django startup?
I tried a PeriodicTask.objects.all().delete() in celery/__init__.py
def _clean_schedule():
from django.db import transaction
from django_celery_beat.models import PeriodicTask
from django_celery_beat.models import PeriodicTasks
with transaction.atomic():
PeriodicTask.objects.\
exclude(task__startswith='celery.').\
exclude(name__in=settings.CELERY_CONFIG.celery_beat.keys()).\
delete()
PeriodicTasks.update_changed()
_clean_schedule()
but that is not allowed because Django isn't properly started up yet:
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
You also can't use Django's AppConfig.ready() because making queries / db connections in ready() is not supported.
Looking at how django-celery-beat actually works to install the schedules, I thought I maybe I could hook into that process.
It doesn't happen when Django starts -- it happens when beat starts. It calls setup_schedule() against the class passed on the beat command line.
Therefore, we can just override the scheduler with
--scheduler=myproject.lib.scheduler:DatabaseSchedulerWithCleanup
to do cleanup:
import logging
from django_celery_beat.models import PeriodicTask
from django_celery_beat.models import PeriodicTasks
from django_celery_beat.schedulers import DatabaseScheduler
from django.db import transaction
class DatabaseSchedulerWithCleanup(DatabaseScheduler):
def setup_schedule(self):
schedule = self.app.conf.beat_schedule
with transaction.atomic():
num, info = PeriodicTask.objects.\
exclude(task__startswith='celery.').\
exclude(name__in=schedule.keys()).\
delete()
logging.info("Removed %d obsolete periodic tasks.", num)
if num > 0:
PeriodicTasks.update_changed()
super(DatabaseSchedulerWithCleanup, self).setup_schedule()
Note, you only want this if you are exclusively managing tasks with beat_schedule. If you add tasks via Django admin or programatically, they will also be deleted.
Related
I use the django-apscheduler package to run cron (scraping) jobs. the package stores the past jobs with some information/properties (e.g. local runtime, duration etc.) somewhere on the database for display on the admin backend.
When I want to access these information/properties about the jobs programmatically in the views.py (e.g. to show the last runtime of a job in the context/template), how would I do that?
in views.py
from django_apscheduler.models import DjangoJobExecution
for accessing the data of executed Jobs
or
from django_abscheduler.models import Jobs
for accessing the scheduled jobs
I have a set of functionalities that are leveraging the the Django management/commands modules to run a bunch of cron jobs that would update the model. However I also need these to execute as all-or-none transactions. Does Django provide a way to define transactions?
If you're trying to wrap a chunk of code in a transaction you can use transaction.atomic as a decorator or context manager, e.g.,
from django.db import transaction
#transaction.atomic
def management_command(args):
# This code executes inside a transaction.
do_stuff()
or
def management_command(args):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()
See https://docs.djangoproject.com/en/2.2/topics/db/transactions/#controlling-transactions-explicitly for more details.
If I store a row data in the database table(instance), and the table has a field names expire_time. if the time over the expire_time, I want to delete the row data.
So, if I want to do that, I can every time query the table, traverse every row data, if expires, then delete.
But if I don't query I can not realize the requirement.
So, if there is a method to do that?
I use python django, the database is mariadb.
You can write a custom management command to do this for you. Save this in myapp/management/commands/delete_expired.py for example:
from django.core.management.base import BaseCommand
from django.utils import timezone
from myapp.models import MyModel
class Command(BaseCommand):
help = 'Deletes expired rows'
def handle(self, *args, **options):
now = timezone.now()
MyModel.objects.filter(expire_time__lt=now).delete()
Then either call that command from a cron task or a queue. To do it on the command line you can call:
python manage.py delete_expired
I am not sure what you mean by:
I can not realize the requirement.
But I think you might want consider:
custom manage.py command, and cron this command with your venv python source
add django-cron to routinely check for expired data and delete it
try celery as another solution to cron but it could be too complecated for your case
add event to MariaDB and schedule it periodical
The drawback of custom manage.py cmd and event is if you migrate server you should remember to add new cron job/event to clean db periodicaly.
I don't know a database-level approach to do that (maybe you want to add the mariadb tag if you are looking for a database-specific solution).
At the application level, an approach comes to mind. You may use Celery and, whenever you store a row data, schedule a task to delete it. The celery task should check that expire_time is effectively invalid (can that field be modified or updated?).
You can also (in addition or as an alternative) have a Celery beat job that periodically gets the element with smaller expire_time. If it should be removed, removed and call itself again. Otherwise, wait for next beat.
I have a Django 1.5.1 webapp using Celery 3.0.23 with RabbitMQ 3.1.5. and sqlite3.
I can submit jobs using a simple result = status.tasks.mymethod.delay(parameter), all tasks executes correctly:
[2013-09-30 17:04:11,369: INFO/MainProcess] Got task from broker: status.tasks.prova[a22bf0b9-0d5b-4ce5-967a-750f679f40be]
[2013-09-30 17:04:11,566: INFO/MainProcess] Task status.tasks.mymethod[a22bf0b9-0d5b-4ce5-967a-750f679f40be] succeeded in 0.194540023804s: u'Done'
I want to display in a page the latest 10 jobs submitted and their status. Is there a way in Django to get such objects? I see a couple of tables in the database (celery_taskmeta and celery_taskmeta_2ff6b945) and tried some accesses to the objects but Django always displays a AttributeError page.
What is the correct way to access Celery results from Django?
Doing
cel = celery.status.tasks.get(None)
cel = status.tasks.all()
does not work, resulting in the aforementioned AttributeError. (status is the name of my app)
EDIT: I am sure tasks are saved, as this small tutorial says:
By default django-celery stores this state in the Django database. You may consider choosing an alternate result backend or disabling states alltogether (see Result Backends).
Following the links there are only references on how to setup the DB connection and not how to retrieve the results.
Try this:
from djcelery.models import TaskMeta
TaskMeta.objects.all()
I am trying to use the combination Django custom logger and Celery task to capture certain application log messages and dump them in DynamoDB asynchronously. I have created a Django Celery task that takes a log message and transfer it to DynamoDB asynchronously. I tried to call this celery task from my custom logger to transfer it to DynamoDB asynchronously.
However, Django custom logger does not allow me to import:
from celery.task import task, Task, PeriodicTask, periodic_task
My server crashes with the below error:
ValueError: Unable to configure handler 'custom_handler': Cannot resolve 'myApp.analytics.tasks.LogHandler': cannot import name cache
I know that Django Logger docs warns against circular imports if the custom logger file
includes settings.py but I have made sure thats not the case. But it is still giving me the same error as that of circular imports.
Am I doing something wrong or is there any other way to achieve asynchronous data transfer to DynamoDB using Django custom logger and DjCelery?
Thanks for any help.
I found the solution.
The problem was "If your settings.py specifies a custom handler class and the file defining that class also imports settings.py a circular import will occur."
To resolve this we need to do the import in the method body instead of the file defining the class.
Here's my custom LogHandler:
import logging
#Do not import settings here, as this would lead to circular import.
#This custom log handler parses the message and inserts the entry to the DynamoDB tables.
class LogHandler(logging.Handler):
def __init__(self):
logging.Handler.__init__(self)
self.report_logger = logging.getLogger('reporting')
self.report_logger.setLevel(logging.INFO)
def emit(self, record):
#Submit the task to "reporting" queue to be picked up and processed by the worker lazily.
#myApp.analytics.tasks imports celery.task
from myApp.analytics import tasks
tasks.push_row_to_dynamodb.apply_async(args=[record])
return
Hope it helps someone.