Keeping a computed model field up-to-date via QuerySets - django

Preface:
Let's assume we are working on a DB that stores issues of magazines.
Those issues usually do not have a 'name' per se; instead a whole bunch of attributes (release year, serial number, release months, etc.) will contribute to the name the user may later on identify the issue with.
Depending on the attributes available per issue, this name will be calculated based on a pattern.
For example: an issue from the year 2017 with number 01 will get the name: 2017-01. An issue from the years 2000 and 2001, and the months Jan and Feb will get the name 2000/01-Jan/Feb.
The attributes can be changed at any time.
It is expected that the user can also do queries based on this name - so simply displaying the computed value (through __str__) is not enough.
What I have done so far:
For a long time, I actually calculated the name every time __str__ was called on the issue's instance. It was the quick and dirty (and slow) way.
Querying for the name was very slow and rather complicated and unreliable as it required 'reverse-engineering' the __str__ method and guessing what the user was trying to search for.
Then I tried a hybrid approach, by using a _name model field that is updated if a _changed_flag (f.ex. through signals) is set and the instance is instantiated or saved. This still didn't leave me with an up-to-date name on the database table unless I instatiated every instance that needed updating first. Again, slow. And care had to be taken to not end up in infinite recursions upon calling refresh_from_db (which recreates the current instance in the background).
TL:DR
Right now, I am using a custom QuerySet as a manager for a model with a computed field:
class ComputedNameModel(BaseModel):
_name = models.CharField(max_length=200, editable=False, default=gettext_lazy("No data."))
_changed_flag = models.BooleanField(editable=False, default=False)
name_composing_fields = []
objects = CNQuerySet.as_manager()
# ... some more methods ...
def __str__(self):
return self._name
class Meta(BaseModel.Meta):
abstract = True
QuerySet:
class CNQuerySet(MIZQuerySet):
def bulk_create(self, objs, batch_size=None):
# Set the _changed_flag on the objects to be created
for obj in objs:
obj._changed_flag = True
return super().bulk_create(objs, batch_size)
def filter(self, *args, **kwargs):
if any(k.startswith('_name') for k in kwargs):
self._update_names()
return super().filter(*args, **kwargs)
def update(self, **kwargs):
# it is save to assume that a name update will be required after this update
# if _changed_flag is not already part of the update, add it with the value True
if '_changed_flag' not in kwargs:
kwargs['_changed_flag'] = True
return super().update(**kwargs)
update.alters_data = True
def values(self, *fields, **expressions):
if '_name' in fields:
self._update_names()
return super().values(*fields, **expressions)
def values_list(self, *fields, **kwargs):
if '_name' in fields:
self._update_names()
return super().values_list(*fields, **kwargs)
def _update_names(self):
if self.filter(_changed_flag=True).exists():
with transaction.atomic():
for pk, val_dict in self.filter(_changed_flag=True).values_dict(*self.model.name_composing_fields).items():
new_name = self.model._get_name(**val_dict)
self.filter(pk=pk).update(_name=new_name, _changed_flag=False)
_update_names.alters_data = True
As you can see the, the boilerplate is real. And I have only cherry picked the QuerySet methods that I know I use for now.
Through signals (for relations) or QuerySet methods, a record's _changed_flag is set when anything about it changes. The records are then updated the next time the _name field is requested in any way.
It's blazingly fast, as it does not require a model instance (only the model's classmethod _get_name()) and works entirely off querysets and in-memory data.
Question:
Where to put the call to _update_names() such that the names are updated when required without overriding every single queryset method?
I have tried putting it in:
_clone: bad things happened. To not end up in recursion hell, you would have to keep track of which clone is trying to update and which are there to simply fetch data for the update. Clones are also created upon initializing your app which has a good (tables are always up-to-date) and a bad side (updating without yet having a need for it, costing time). Not all queryset methods create clones and generally, putting your update check in _clone feels too deep.
__repr__: Keeps the shell output up-to-date, but not much more. The default implementation takes a slice of the queryset, disabling the ability to filter, so the updating has to be done before __repr__.
_fetch_all: Like _clone: it runs an update when you may not need it and requires keeping an internal 'you-are-allowed-to-try-an-update' check.

Related

How can I check if this instance initing from database or it is new object

I have django model of a Task. In my website Tasks can be repeated and not repeated. So I override init method in this way:
def __init__(self, *args, **kwargs):
"""Init new Task
:param period - timedelta object. How often should task be repeated
:param repeated - if true task will be repeated
:param end_date - datetime until which tasks will be repeated or it will be repeated for 10 years
"""
if not self.from_database() # I don't now how to do it
if kwargs.get("repeated"):
period = kwargs["period"]
end_date = kwargs["end_date"] or kwargs["date_time"] + TEN_YEARS
base_task = self
date = base_task.date_time + period
with transaction.atomic():
while date <= end_date:
base_task.clone_task(date)
date += period
try:
del kwargs["repeated"]
del kwargs["period"]
del kwargs["end_date"]
except KeyError:
pass
super().__init__(*args, **kwargs)
But i have problem. How can I check if this instance initing from database or it is new object.
First, I am not sure I would have designed your tasks creation this way. The task object should scope itself and avoid having to deal with other tasks. Maybe, a task creator handling the option of creating repeated tasks might be more appropriate. It is just an opinion without knowing your context but it will avoid your issue without having to do any special stuff.
This being said, as per Django documentation you should avoid the override of the init() method. It proposes 2 other approches. One of them involve overriding the Manager.
https://docs.djangoproject.com/en/3.2/ref/models/instances/
It seams you which to encapsulate some logic from inside your object and in a oop perspective, this is debatable but understandable. However, in the django way of doing things, you should encapsulate this kind of logic in a specialized manager.
I use this way of doing things to maintain fields such as created_at, updated_at, deleted_at and is_deleted in many of my entities and it works fine.
I would not recommand creating objects from inside an object creation. If you want to do so, be sure here, your clone_task method does not include the repeated argument otherwise each repeated task will create many repeated task and you will end up with many duplicates.
This being said, in a model, the following code will return True if object is newly created and False if comming from the db:
self._state.adding
Django documentation about _state

How to get a model's last access date in Django?

I'm building a Django application, and in it I would like to track whenever a particular model was last accessed.
I'm opting for this in order to build a user activity history.
I know Django provides auto_now and auto_now_add, but these do not do what I want them to do. The latter tracks when a model was created, and the former tracks when it was last modified, which is different from when it was last accessed, mind you.
I've tried adding another datetime field to my model's specification:
accessed_on = models.DateTimeField()
Then I try to update the model's access manually by calling the following after each access:
model.accessed_on = datetime.utcnow()
model.save()
But it still won't work.
I've gone through the django documentation for an answer, but couldn't find one.
Help would be much appreciated.
What about creating a model with a field that contains the last save-date. Plus saving the object every time is translated from the DB representation to the python representation?
class YourModel(models.Model):
date_accessed = models.DateTimeField(auto_now=True)
#classmethod
def from_db(cls, db, field_names, values):
obj = super().from_db(db, field_names, values)
obj.save()
return obj

Showing a total of items created in a Django model

My Django project contains a task manager with Projects and Tasks, I have generic list page showing a list of all projects with information on their total tasks:
class IndexView(generic.ListView):
template_name = 'projects/index.html'
context_object_name = 'project_list'
def get_queryset(self):
"""Return 10 projects."""
return Project.objects.order_by('is_complete')[:10]
I would like to display on my list page the total number of added projects and tasks, but I'm unsure how I should go about this. All my current work has been around listing the number of tasks that are included i each project, but now I want a total - should I add this as a new View? For example, I tried adding this to the view above:
def total_projects(self):
return Project.objects.count()
Then calling {{ project_list.total_projects }} on my template, but it doesn't return anything.
Is Views the correct place to do this?
All by current work has been around listing the number of tasks that are included i each project, but now I want a total - should I add this as a new View?
It depends. If you just want to show the total number of projects and tasks with the first 10 completed projects in the database (which is what your get_queryset method does, be careful), I would go and do all of it in the same view (it would be useless to make a new view only to show some numbers, and that isn't the purpose of ListView IMO).
On the other hand, you're calling a class's instance method (total_projects) from a model's instance. That method doesn't exists on the model, and when an attribute/method doesn't exists in an object when calling it in a template, you just get nothing. Based on the previous paragraph, I would set it in the view's context using get_context_data:
def get_context_data(self, **kwargs):
data = super().get_context_data(**kwargs)
data["total_projects"] = Projects.objects.count()
# I'm assuming you only need the whole number of tasks with no separation
data["total_tasks"] = Task.objects.count()
return data
Finally, you can edit your get_queryset method and make it be an instance attribute (if you want it to be cleaner and you can handle the filtering with no additional code):
class IndexView(generic.ListView):
queryset = Project.objects.order_by('is_complete')[:10]
I believe it's more common to put function definitions in the Model class (Project, from the looks of it), and add the #property tag above the function.
class Project(models.Model):
''' definitions and stuff '''
#property
def total_projects(self): # etc...
As for your specific case, you could forego the function altogether and just use {{ project_list.count }} or {{ project_list|length }} in your template.
A note about count vs length from the docs:
A count() call performs a SELECT COUNT(*) behind the scenes, so you
should always use count() rather than loading all of the record into
Python objects and calling len() on the result (unless you need to
load the objects into memory anyway, in which case len() will be
faster).
Note that if you want the number of items in a QuerySet and are also
retrieving model instances from it (for example, by iterating over
it), it’s probably more efficient to use len(queryset) which won’t
cause an extra database query like count() would.
So use the correct one for your usage.
Also, according to this answer and the below comment from #djangomachine, length may not always return the same number of records as count. If accuracy is important it may be better to use count regardless of the above case.

Django - What is the best way to filter out softly deleted rows?

When my program performs a soft deletion, the softly deleted rows would be marked as inactive or deleted (e.g. person.deleted=True). The question is, what is the best way to make sure that every retrieval of data from this table would only return the active records without having to add the deleted=False argument to the filter method (which is not only repetitive, but also prone to errors).
You can try creating custom object manager for your model. This may be enough or not, depending on your requirements and further project implementation.
class Person(models.Model):
# ...
objects = PersonManager()
class PersonManager(models.Manager):
def all(self, *args, **kwargs):
return super(PersonManager, self).filter(deleted=False)
def deleted(self, *args, **kwargs):
return super(PersonManager, self).filter(deleted=True)
# ...
Update: Another convenient way to do that is with django-livefield

Override Default Save Method And Create Duplicate

I am looking to create a duplicate instance each time a user tries to update an instance. The existing record is untouched and the full update is saved to the new instance.
Some foreign keys and reverse foreign keys must also be duplicated. The Django documentation
talks about duplicating objects, but does not address reverse foreign keys.
Firstly, is there an accepted way of approaching this problem?
Secondly, I am unsure whether it's best to overwrite the form save method or the model save method? I would want it to apply to everything, regardless of the form, so I assume it should be applied at the model level?
A simplified version of the models are outlined below.
class Invoice(models.Model):
number = models.CharField(max_length=15)
class Line(models.Model):
invoice = models.ForeignKey(Invoice)
price = models.DecimalField(max_digits=15, decimal_places=4)
Here's my shot at it. If you need it to duplicate every time you make any changes, then override the model save method. Note that this will not have any effect when executing .update() on a queryset.
class Invoice(models.Model):
number = models.CharField(max_length=15)
def save(self, *args, **kwargs):
if not self.pk:
# if we dont have a pk set yet, it is the first time we are saving. Nothing to duplicate.
super(Invoice, self).save(*args, **kwargs)
else:
# save the line items before we duplicate
lines = list(self.line_set.all())
self.pk = None
super(Invoice, self).save(*args, **kwargs)
for line in lines:
line.pk = None
line.invoice = self
line.save()
This will create a duplicate Invoice every time you call .save() on an existing record. It will also create duplicates for every Line tied to that Invoice. You may need to do something similar every time you update a Line as well.
This of course is not very generic. This is specific to these 2 models. If you need something more generic, you could loop over every field, determine what kind of field it is, make needed duplicates, etc.