I have a set of functionalities that are leveraging the the Django management/commands modules to run a bunch of cron jobs that would update the model. However I also need these to execute as all-or-none transactions. Does Django provide a way to define transactions?
If you're trying to wrap a chunk of code in a transaction you can use transaction.atomic as a decorator or context manager, e.g.,
from django.db import transaction
#transaction.atomic
def management_command(args):
# This code executes inside a transaction.
do_stuff()
or
def management_command(args):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()
See https://docs.djangoproject.com/en/2.2/topics/db/transactions/#controlling-transactions-explicitly for more details.
Related
I'm using django-celery-beat in a django app (this stores the schedule in the database instead of a local file). I've configured my schedule via celery_beat that Celery is initialized with via app.config_from_object(...)
I recently renamed/removed a few tasks and restarted the app. The new tasks showed up, but the tasks removed from the celery_beat dictionary didn't get removed from the database.
Is this expected workflow -- requiring manual removal of tasks from the database? Is there a workaround to automatically reconcile the schedule at Django startup?
I tried a PeriodicTask.objects.all().delete() in celery/__init__.py
def _clean_schedule():
from django.db import transaction
from django_celery_beat.models import PeriodicTask
from django_celery_beat.models import PeriodicTasks
with transaction.atomic():
PeriodicTask.objects.\
exclude(task__startswith='celery.').\
exclude(name__in=settings.CELERY_CONFIG.celery_beat.keys()).\
delete()
PeriodicTasks.update_changed()
_clean_schedule()
but that is not allowed because Django isn't properly started up yet:
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
You also can't use Django's AppConfig.ready() because making queries / db connections in ready() is not supported.
Looking at how django-celery-beat actually works to install the schedules, I thought I maybe I could hook into that process.
It doesn't happen when Django starts -- it happens when beat starts. It calls setup_schedule() against the class passed on the beat command line.
Therefore, we can just override the scheduler with
--scheduler=myproject.lib.scheduler:DatabaseSchedulerWithCleanup
to do cleanup:
import logging
from django_celery_beat.models import PeriodicTask
from django_celery_beat.models import PeriodicTasks
from django_celery_beat.schedulers import DatabaseScheduler
from django.db import transaction
class DatabaseSchedulerWithCleanup(DatabaseScheduler):
def setup_schedule(self):
schedule = self.app.conf.beat_schedule
with transaction.atomic():
num, info = PeriodicTask.objects.\
exclude(task__startswith='celery.').\
exclude(name__in=schedule.keys()).\
delete()
logging.info("Removed %d obsolete periodic tasks.", num)
if num > 0:
PeriodicTasks.update_changed()
super(DatabaseSchedulerWithCleanup, self).setup_schedule()
Note, you only want this if you are exclusively managing tasks with beat_schedule. If you add tasks via Django admin or programatically, they will also be deleted.
If I store a row data in the database table(instance), and the table has a field names expire_time. if the time over the expire_time, I want to delete the row data.
So, if I want to do that, I can every time query the table, traverse every row data, if expires, then delete.
But if I don't query I can not realize the requirement.
So, if there is a method to do that?
I use python django, the database is mariadb.
You can write a custom management command to do this for you. Save this in myapp/management/commands/delete_expired.py for example:
from django.core.management.base import BaseCommand
from django.utils import timezone
from myapp.models import MyModel
class Command(BaseCommand):
help = 'Deletes expired rows'
def handle(self, *args, **options):
now = timezone.now()
MyModel.objects.filter(expire_time__lt=now).delete()
Then either call that command from a cron task or a queue. To do it on the command line you can call:
python manage.py delete_expired
I am not sure what you mean by:
I can not realize the requirement.
But I think you might want consider:
custom manage.py command, and cron this command with your venv python source
add django-cron to routinely check for expired data and delete it
try celery as another solution to cron but it could be too complecated for your case
add event to MariaDB and schedule it periodical
The drawback of custom manage.py cmd and event is if you migrate server you should remember to add new cron job/event to clean db periodicaly.
I don't know a database-level approach to do that (maybe you want to add the mariadb tag if you are looking for a database-specific solution).
At the application level, an approach comes to mind. You may use Celery and, whenever you store a row data, schedule a task to delete it. The celery task should check that expire_time is effectively invalid (can that field be modified or updated?).
You can also (in addition or as an alternative) have a Celery beat job that periodically gets the element with smaller expire_time. If it should be removed, removed and call itself again. Otherwise, wait for next beat.
I'm on Django 1.8 (using pytest) and I have the following configuration:
A default and a readonly database managed by a MasterSlaveRouter that directs DB calls to one connection or the other depending on whether they're read or write operations.
In my development environment, both entries in the settings.DATABASES dictionary have the same setup (they just use a different connection, but the database is the same).
In my test environment, however, there's only a default database.
I have a post_save signal fired whenever a model Foo is saved.
I have an atomic operation (decorated with #transaction.atomic) that modifies a Foo instance and calls .save() on it twice. Since no custom using parameter is passed to the decorator, the transaction is only active on the default database.
The post_save callback creates a Bar record with a OneToOneField pointing to Foo, but only after checking whether a Bar record with this foo_id already exists (in order to avoid IntegrityError). This check is done by performing this query:
already_exists = Bar.filter(foo=instance).exists()
This is ok the first time the post_save callback is called. A Bar record is created and everything works fine. The second time, however, even though such a Bar instance was just created in the previous Foo save, since filtering is a read operation, it is performed using the readonly connection, and therefore already_exists ends up containing the value False and the creation of a new record is triggered, which eventually throws an IntegrityError because when the create operation is performed on the default connection, there is already a record with that foo_id.
I tried copying the DATABASES dictionary from dev_settings to test_settings, but this broke many tests. I then read about the override_settings decorator and thought it would be perfect for my situation. For my surprise, however, it didn't work. It seems that at some point, when the application is initiated, the DATABASES dictionary (the one only with default from the test_settings) is cached and then even though I change setting.DATABASES, the new value is simply not accessed anymore.
How can I properly override the database configuration for one specific test?
Hum... well if you are using only pytest, I think you'll need to cleanup your databases after tests.
Now, to override django settings, it's good to :
from django.test import override_settings
#override_settings(DATABASE_CONFIG=<new_config>)
def test_foo():
pass
You should try the pytest-django:
pytestmark = pytest.mark.django_db
#pytest.mark.django_db
def test_foo():
pass
When you run your tests, you can set the create-db param, to force py.test create a new database or if you want to reuse your db, you can set the reuse-db, like:
$ py.test --create-db
$ py.test --reuse-db
checkout:
Oficial docs
I have a django view with this function to get the data for a template:
def get_context_data(self, **kwargs):
context = super(MyView, self).get_context_data(**kwargs)
context['extra_data'] = a_long_running_function()
return context
The extra_data is displayed in a table. As the above function indicates, the page takes a long time to load due to calculation of extra_data.
So how can I show the page straight away, and then update the tablewhen extra_data is computed?
I understand how I can use celery to make a_long_running_function execute asynchronously, but I dont know how to then make the page (which is now loaded, but missing data for the table), get that data and update automatically?
If you plan in going ahead with celery, you will need 2 views:
1.viewA that loads the main page (without the extra_data - maybe a spinning gif animation in it's place in the HTML, to convey to the user that there is still data to be loaded in the page). This view will also start the celery task (but will not wait for it to complete). It would look similar to:
def viewA(request):
task = a_long_running_function.delay()
return render_to_response('viewA.html', {'task_id': task.id})
2.viewB that will be accessed via AJAX after the user's browser loads viewA (it's purpose will be to provide the extra_data which was not loaded by viewA). It would look similar to:
def viewB(request, task_id):
extra_data = a_long_running_function.AsyncResult(task_id)
if extra_data.ready():
return render_to_response('viewB.html', {'extra_data': extra_data.get()})
return HttpResponse('')
Once the user's browser finishes loading viewA, you will need a bit of javascript to start running AJAX requests every X seconds/minutes to viewB to attempt to retrieve the celery task result (based on the celery task id that is available). Once the AJAX request successfully retrieves the task result from viewB, it can make it visible to the user.
Anybody interested in asynchronous updating a template using AJAX can use django-async-include (GitHub repository).
This project makes it easy changing an static block inclusion to a asynchronous one. That's perfect for inclusion of computational-heavy template block.
Disclaimer: I'm the developer of this project.
I am trying to use the combination Django custom logger and Celery task to capture certain application log messages and dump them in DynamoDB asynchronously. I have created a Django Celery task that takes a log message and transfer it to DynamoDB asynchronously. I tried to call this celery task from my custom logger to transfer it to DynamoDB asynchronously.
However, Django custom logger does not allow me to import:
from celery.task import task, Task, PeriodicTask, periodic_task
My server crashes with the below error:
ValueError: Unable to configure handler 'custom_handler': Cannot resolve 'myApp.analytics.tasks.LogHandler': cannot import name cache
I know that Django Logger docs warns against circular imports if the custom logger file
includes settings.py but I have made sure thats not the case. But it is still giving me the same error as that of circular imports.
Am I doing something wrong or is there any other way to achieve asynchronous data transfer to DynamoDB using Django custom logger and DjCelery?
Thanks for any help.
I found the solution.
The problem was "If your settings.py specifies a custom handler class and the file defining that class also imports settings.py a circular import will occur."
To resolve this we need to do the import in the method body instead of the file defining the class.
Here's my custom LogHandler:
import logging
#Do not import settings here, as this would lead to circular import.
#This custom log handler parses the message and inserts the entry to the DynamoDB tables.
class LogHandler(logging.Handler):
def __init__(self):
logging.Handler.__init__(self)
self.report_logger = logging.getLogger('reporting')
self.report_logger.setLevel(logging.INFO)
def emit(self, record):
#Submit the task to "reporting" queue to be picked up and processed by the worker lazily.
#myApp.analytics.tasks imports celery.task
from myApp.analytics import tasks
tasks.push_row_to_dynamodb.apply_async(args=[record])
return
Hope it helps someone.