If I store a row data in the database table(instance), and the table has a field names expire_time. if the time over the expire_time, I want to delete the row data.
So, if I want to do that, I can every time query the table, traverse every row data, if expires, then delete.
But if I don't query I can not realize the requirement.
So, if there is a method to do that?
I use python django, the database is mariadb.
You can write a custom management command to do this for you. Save this in myapp/management/commands/delete_expired.py for example:
from django.core.management.base import BaseCommand
from django.utils import timezone
from myapp.models import MyModel
class Command(BaseCommand):
help = 'Deletes expired rows'
def handle(self, *args, **options):
now = timezone.now()
MyModel.objects.filter(expire_time__lt=now).delete()
Then either call that command from a cron task or a queue. To do it on the command line you can call:
python manage.py delete_expired
I am not sure what you mean by:
I can not realize the requirement.
But I think you might want consider:
custom manage.py command, and cron this command with your venv python source
add django-cron to routinely check for expired data and delete it
try celery as another solution to cron but it could be too complecated for your case
add event to MariaDB and schedule it periodical
The drawback of custom manage.py cmd and event is if you migrate server you should remember to add new cron job/event to clean db periodicaly.
I don't know a database-level approach to do that (maybe you want to add the mariadb tag if you are looking for a database-specific solution).
At the application level, an approach comes to mind. You may use Celery and, whenever you store a row data, schedule a task to delete it. The celery task should check that expire_time is effectively invalid (can that field be modified or updated?).
You can also (in addition or as an alternative) have a Celery beat job that periodically gets the element with smaller expire_time. If it should be removed, removed and call itself again. Otherwise, wait for next beat.
Related
I use django-celery-beat to create the task. Everything is registered, connected, the usual Django tascas using beat_schedule work. I add a new task, I don't register it in beat_schedule:
#app.task
def say_hi():
print("hello test")
I go into admin, add Periodic tasks, this task in Task (registered) is visible, I select it, I also select the interval every minute and save, Task (registered) is zeroed, and its value appears in Task (custom).
The task itself does not start its execution, in the console print is not displayed, but Last Run Datetime is updated in the admin. What could go wrong?
I'll be running a script in a server which will automatically create model instances in a database. The idea is to use a infinite loop (e.g while True:) which will be endlessly creating instances until I somehow stop it.
I want to use Django to nicely check from my website how big my database is, and from there I want to stop or restart it.
What could be a good approach here?
I was thinking about Celery, but I don't know how would I don't have clear how to stop it and it kind of looks like an overkill. Any suggestion?
A simple solution is to have a class that saves to the db the name of the script and whether it should keep running:
class ScriptTracker():
name = models.Charfield()
keep_running = models.BooleanField()
Then your script would just check the db every loop to see if it should stop:
def my_script():
while True:
if not ScriptTracker.objects.get(name="my_script").keep_running:
# stop running
return
# creating an instance in the db
MyObject.objects.create(name="helloworld")
Create the ScriptTracker object
ScriptTracker.objects.create(name="my_script", keep_running=True)
Start your script running, could be done simple if script is built as a management command:
python manage.py my_script
I'm using django-celery-beat in a django app (this stores the schedule in the database instead of a local file). I've configured my schedule via celery_beat that Celery is initialized with via app.config_from_object(...)
I recently renamed/removed a few tasks and restarted the app. The new tasks showed up, but the tasks removed from the celery_beat dictionary didn't get removed from the database.
Is this expected workflow -- requiring manual removal of tasks from the database? Is there a workaround to automatically reconcile the schedule at Django startup?
I tried a PeriodicTask.objects.all().delete() in celery/__init__.py
def _clean_schedule():
from django.db import transaction
from django_celery_beat.models import PeriodicTask
from django_celery_beat.models import PeriodicTasks
with transaction.atomic():
PeriodicTask.objects.\
exclude(task__startswith='celery.').\
exclude(name__in=settings.CELERY_CONFIG.celery_beat.keys()).\
delete()
PeriodicTasks.update_changed()
_clean_schedule()
but that is not allowed because Django isn't properly started up yet:
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
You also can't use Django's AppConfig.ready() because making queries / db connections in ready() is not supported.
Looking at how django-celery-beat actually works to install the schedules, I thought I maybe I could hook into that process.
It doesn't happen when Django starts -- it happens when beat starts. It calls setup_schedule() against the class passed on the beat command line.
Therefore, we can just override the scheduler with
--scheduler=myproject.lib.scheduler:DatabaseSchedulerWithCleanup
to do cleanup:
import logging
from django_celery_beat.models import PeriodicTask
from django_celery_beat.models import PeriodicTasks
from django_celery_beat.schedulers import DatabaseScheduler
from django.db import transaction
class DatabaseSchedulerWithCleanup(DatabaseScheduler):
def setup_schedule(self):
schedule = self.app.conf.beat_schedule
with transaction.atomic():
num, info = PeriodicTask.objects.\
exclude(task__startswith='celery.').\
exclude(name__in=schedule.keys()).\
delete()
logging.info("Removed %d obsolete periodic tasks.", num)
if num > 0:
PeriodicTasks.update_changed()
super(DatabaseSchedulerWithCleanup, self).setup_schedule()
Note, you only want this if you are exclusively managing tasks with beat_schedule. If you add tasks via Django admin or programatically, they will also be deleted.
Within my app i have a function which I want to run every hour to collect data and populate a database (I have an RDS database linked to my Elastic Beankstalk app). This is the function I want to want (a static method defined in my Data model):
#staticmethod
def get_data():
page = requests.get(....)
soup = BeautifulSoup(page, 'lxml')
.....
site_data = Data.objects.create(...)
site_data.save()
>>> Data.get_data()
# populates database on my local machine
From reading it seems I want to use either Celery or a cron job. I am unfamiliar with either of these and it seems quite complicated using them with AWS. This post here seems most relevant but I am unsure how I would apply the suggestion to my example. Would I need to create a management command as mentioned and what would this look like with my example?
As this is new to me it would help a lot it someone could point me down the right path.
How to create a management command is covered very detailed in the docs.
The following provides a management command called foobar.
project_root/app_name/management/commands/foobar.py
from django.core.management.base import BaseCommand, CommandError
from yourapp.models import Data
class Command(BaseCommand):
help = 'Dump data'
def handle(self, *args, **options):
Data.get_data()
Please read the linked docs - e.g. there are a few __init__.py files that need to be present for django to discover the command properly.
When your project is installed on your EBS it should be connected to the proper database and the data gets stored there.
To configure the cron, follow the instructions from your linked question. There is also AWS Elastic Beanstalk, running a cronjob that covers the topic more detailed.
The line in crontab file should look like that.
0 * * * * /path/to/your/environment/bin/python /path/to/your/project_root/manage.py name_of_your_management_command > /path/to/your/cron.log 2>&1
As I've never used EBS so far the paths are not correct, but with explanations which path it should be. A few details regarding the cron line.
0 * * * * run the command if minute is 0 each hour * at each day * of the month in each month * and every day of th week *
The next part is the command that should run
/path/to/your/environment/bin/python use the python from your projects environment
/path/to/your/project_root/manage.py to invoke your projects manage.py
foobar which should run your management command
> /path/to/your/cron.log 2>&1 Whole the output from this script STDIN and STDERR should be written into the file /path/to/your/cron.log
Every now and then, you have the need to rename a model in Django (or, in one recent case I encountered, split one model into two, with new/different names). (Yes, proper planning helps to avoid this situation).
After renaming corresponding tables in the db and fixing affected code, one problem remains: Any permissions granted to Users or Groups to operate on those models still references the old model names. Is there any automated or semi-automated way to fix this, or is it just a matter of manual db surgery? (in development you can drop the auth_permissions table and syncdb to recreate it, but production isn't so simple).
Here's a snippet that fills in missing contenttypes and permissions. I wonder if it could be extended to at least do some of the donkey work for cleaning up auth_permissions.
If you happened to have used a South schema migration to rename the table, the following line in the forward migration would have done this automatically:
db.send_create_signal('appname', ['modelname'])
I got about half-way through a long answer that detailed the plan of attack I would take in this situation, but as I was writing I realized there probably isn't any way around having to do a maintenance downtime in this situation.
You can minimize the downtime by having a prepared loaddata script of course, although care needs to be taken to make sure the auth_perms primary keys are in sync.
Also see short answer: no automated way to do this of which I'm aware.
I recently had this issue and wrote a function to solve it. You'll typically have a discrepancy with both the ContentType and Permission tables if you rename a model/table. Django has built-in helper functions to resolve the issue and you can use them as follow:
from django.contrib.auth.management import create_permissions
from django.contrib.contenttypes.management import update_all_contenttypes
from django.db.models import get_apps
def update_all_content_types_and_permissions():
for app in get_apps():
create_permissions(app, None, 2)
update_all_contenttypes()
I changed verbose names in my application, and in Django 2.2.7 this is the only way I found to fix permissions:
from django.core.management.base import BaseCommand, CommandError
from django.contrib.auth.models import Permission
class Command(BaseCommand):
help = 'Fixes permissions names'
def handle(self, *args, **options):
for p in Permission.objects.filter(content_type__app_label="your_app_label_here"):
p.name = "Can %s %s"%(p.codename.split('_')[0], p.content_type.model_class()._meta.verbose_name)
p.save()