How to reliably run custom code when related (ForeignKey) objects change - django

In a simple ForeignKey relation, I want to run specific code when one of the related object is modified.
Here a schematic code :
class Car(models.Model):
pass
class Wheel(models.Model):
car = models.ForeignKey('Car',
on_delete=models.CASCADE,
related_name='wheels',
null=True)
def save(self, *args, **kwargs):
super().save(*args, **kwargs)
my_custom_code()
According to django's documentation, the reverse related manager method .add perform the save in database using update() instead of save() by default. (idem for .remove, etc...)
So in the below code, the my_custom_code isn't called when using .add:
car = Car.objects.create()
wheel = Wheel.objects.create() # ok my_custom_code called here
car.wheels.add(wheel) # not called here because django use "update mechanism" which do not use save method
We need to indicate bulk=False to add method in order to force using save method.
car.wheels.add(wheel, bulk=False) # ok my_custom_code called here
This is a problem for me as it is important that the my_custom_code is called when any of the related objects is modified. If someone forgot to indicate bulk=False, it will generate inconsistent data.
There is a signal for this case ? (like m2m_changed signal)
There is a way to force bulk=False for all methods of a ForeignKey relation ?
Thanks for your help.
Why I need this ? My X problem is to store a computed result depending of the related elements in the parent record. (and I need this value always be up to date)
My initial idea was to do computation every time a related model is modified by overriding save.

Related

Keeping a computed model field up-to-date via QuerySets

Preface:
Let's assume we are working on a DB that stores issues of magazines.
Those issues usually do not have a 'name' per se; instead a whole bunch of attributes (release year, serial number, release months, etc.) will contribute to the name the user may later on identify the issue with.
Depending on the attributes available per issue, this name will be calculated based on a pattern.
For example: an issue from the year 2017 with number 01 will get the name: 2017-01. An issue from the years 2000 and 2001, and the months Jan and Feb will get the name 2000/01-Jan/Feb.
The attributes can be changed at any time.
It is expected that the user can also do queries based on this name - so simply displaying the computed value (through __str__) is not enough.
What I have done so far:
For a long time, I actually calculated the name every time __str__ was called on the issue's instance. It was the quick and dirty (and slow) way.
Querying for the name was very slow and rather complicated and unreliable as it required 'reverse-engineering' the __str__ method and guessing what the user was trying to search for.
Then I tried a hybrid approach, by using a _name model field that is updated if a _changed_flag (f.ex. through signals) is set and the instance is instantiated or saved. This still didn't leave me with an up-to-date name on the database table unless I instatiated every instance that needed updating first. Again, slow. And care had to be taken to not end up in infinite recursions upon calling refresh_from_db (which recreates the current instance in the background).
TL:DR
Right now, I am using a custom QuerySet as a manager for a model with a computed field:
class ComputedNameModel(BaseModel):
_name = models.CharField(max_length=200, editable=False, default=gettext_lazy("No data."))
_changed_flag = models.BooleanField(editable=False, default=False)
name_composing_fields = []
objects = CNQuerySet.as_manager()
# ... some more methods ...
def __str__(self):
return self._name
class Meta(BaseModel.Meta):
abstract = True
QuerySet:
class CNQuerySet(MIZQuerySet):
def bulk_create(self, objs, batch_size=None):
# Set the _changed_flag on the objects to be created
for obj in objs:
obj._changed_flag = True
return super().bulk_create(objs, batch_size)
def filter(self, *args, **kwargs):
if any(k.startswith('_name') for k in kwargs):
self._update_names()
return super().filter(*args, **kwargs)
def update(self, **kwargs):
# it is save to assume that a name update will be required after this update
# if _changed_flag is not already part of the update, add it with the value True
if '_changed_flag' not in kwargs:
kwargs['_changed_flag'] = True
return super().update(**kwargs)
update.alters_data = True
def values(self, *fields, **expressions):
if '_name' in fields:
self._update_names()
return super().values(*fields, **expressions)
def values_list(self, *fields, **kwargs):
if '_name' in fields:
self._update_names()
return super().values_list(*fields, **kwargs)
def _update_names(self):
if self.filter(_changed_flag=True).exists():
with transaction.atomic():
for pk, val_dict in self.filter(_changed_flag=True).values_dict(*self.model.name_composing_fields).items():
new_name = self.model._get_name(**val_dict)
self.filter(pk=pk).update(_name=new_name, _changed_flag=False)
_update_names.alters_data = True
As you can see the, the boilerplate is real. And I have only cherry picked the QuerySet methods that I know I use for now.
Through signals (for relations) or QuerySet methods, a record's _changed_flag is set when anything about it changes. The records are then updated the next time the _name field is requested in any way.
It's blazingly fast, as it does not require a model instance (only the model's classmethod _get_name()) and works entirely off querysets and in-memory data.
Question:
Where to put the call to _update_names() such that the names are updated when required without overriding every single queryset method?
I have tried putting it in:
_clone: bad things happened. To not end up in recursion hell, you would have to keep track of which clone is trying to update and which are there to simply fetch data for the update. Clones are also created upon initializing your app which has a good (tables are always up-to-date) and a bad side (updating without yet having a need for it, costing time). Not all queryset methods create clones and generally, putting your update check in _clone feels too deep.
__repr__: Keeps the shell output up-to-date, but not much more. The default implementation takes a slice of the queryset, disabling the ability to filter, so the updating has to be done before __repr__.
_fetch_all: Like _clone: it runs an update when you may not need it and requires keeping an internal 'you-are-allowed-to-try-an-update' check.

Concise way of getting or creating an object with given field values

Suppose I have:
from django.db import models
class MyContentClass(models.Model):
content = models.TextField()
another_field = models.TextField()
x = MyContentClass(content="Hello, world!", another_field="More Info")
Is there a more concise way to perform the following logic?
existing = MyContentClass.objects.filter(content=x.content, another_field=x.another_field)
if existing:
x = existing[0]
else:
x.save()
# x now points to an object which is saved to the DB,
# either one we've just saved there or one that already existed
# with the same field values we're interested in.
Specifically:
Is there a way to query for both (all) fields without specifying
each one separately?
Is there a better idiom for either getting the old object or saving the new one? Something like get_or_create, but which accepts an object as a parameter?
Assume the code which does the saving is separate from the code which generates the initial MyContentClass instance which we need to compare to. This is typical of a case where you have a function which returns a model object without also saving it.
You could convert x to a dictionary with
x_data = x.__dict__
Then that could be passed into the object's get_or_create method.
MyContentClass.objects.get_or_create(**x_data)
The problem with this is that there are a few fields that will cause this to error out (eg the unique ID, or the _state Django modelstate field). However, if you pop() those out of the dictionary beforehand, then you'd probably be good to go :)
cleaned_dict = remove_unneeded_fields(x_data)
MyContentClass.objects.get_or_create(**cleaned_dict)
def remove_unneeded_fields(x_data):
unneeded_fields = [
'_state',
'id',
# Whatever other fields you don't want the new obj to have
# eg any field marked as 'unique'
]
for field in unneeded_fields:
del x_data[field]
return x_data
EDIT
To avoid issues associated with having to maintain a whitelist/blacklist of fields you, could do something like this:
def remove_unneeded_fields(x_data, MyObjModel):
cleaned_data = {}
for field in MyObjModel._meta.fields:
if not field.unique:
cleaned_data[field.name] = x_data[field.name]
return cleaned_Data
There would probably have to be more validation than simply checking that the field is not unique, but this might offer some flexibility when it comes to minor model field changes.
I would suggest to create a custom manager for those models and add the functions you want to do with the models (like a custom get_or_create function).
https://docs.djangoproject.com/en/1.10/topics/db/managers/#custom-managers
This would be the cleanest way and involves no hacking. :)
You can create specific managers for specific models or create a superclass with functions you want for all models.
If you just want to add a second manager with a different name, beware that it will become the default manager if you don't set the objects manager first (https://docs.djangoproject.com/en/1.10/topics/db/managers/#default-managers)

Restrict a model to access only rows with a specific condition?

I want to use a Django model to access a subset of database rows. Working with a number of legacy databases, I'd rather not create views to the database, if possible.
In short, I'd like to tell my model that there's field foo which should always have the value bar. This should span any CRUD operation for the table, so that newly created rows would also have foo=bar. Is there a simple Django way for what I'm trying to achieve?
UPDATE: I want to ensure that this model doesn't write anything to the table where foo != bar. It must be able to read, modify or delete only those rows where foo=bar.
For newly created items you can set the default value in model definition
class MyModel(models.Model):
# a lot of fields
foo = models.CharField(max_length=10, default='bar')
# Set the manager
objects = BarManager()
def save(self, force_insert=False, force_update=False, using=None):
self.foo = 'bar'
super(MyModel, self).save(force_insert, force_update, using)
To achieve that MyModel.objects.all() should return only rows with foo=bar you should implement your custom manager. You can re-define the get_query_set method to add filtering.
class BarManager(models.Manager):
use_for_related_fields = True
def get_query_set(self):
return super(BarManager, self).get_query_set().filter(foo='bar')
Update after #tuomassalo comment
1) The custom manager will affect all calls to MyModel.objects.get(id=42) as this call just proxy a call to .get_query_set().get(id=42). To achieve this you have to set Manager as default manager for model (assign it to objects variable).
To use this manager for related lookups (e.g. another_model_instance.my_model_set.get(id=42)) you need to set use_for_related_fields = True on you BarManager. See Controlling automatic Manager types in the docs.
2) If you want to enforce foo=bar then default value is not enough for you. You can either use pre_save signal or overwrite the save method on your model. Don't forget to call the original save method.
I updated the MyModel example above.

Getting ID of newly created object in save()

I want to save an object, so that the M2M get saved. Then I want to read out the M2M fields to do some calculations and set a field on the saved object.
class Item(models.Model):
name = models.CharField(max_length=20, unique=True)
product = models.ManyToManyField(SomeOtherModel, through='SomeTable')
def save(self, *args, **kwargs):
super(Item, self).save(*args, **kwargs)
m2m_items = SomeTable.objects.filter(item = self)
# DO SOME STUFF WITH THE M2M ITEMS
The m2m_items won't turn up,. Is there any way to get these up ?
Some confusion here.
Once you've called super, self.id will have a value.
However, I don't understand the point of your filter call. For a start, you probably mean get rather than filter anyway, as filter gets a queryset, rather than a single instance. But even so, the call is pointless: you've just saved it, so whatever you get back from the database will be exactly the same. What's the point?
Edit after question update OK, thanks for the clarification. However, the model's save() method is not responsible for doing anything with M2M items. They need to be saved separately, which is the job of the form or the view.

Post create instance code call in django models

Sorry for some crazy subj.
I'd like to override django models save method and call some additional code if the model instance is newly created.
Sure I can use signals or check if the model have empty pk field and if yes, create temporary variable and later call a code:
Class EmailModel(models.Model):
email = models.EmailField()
def save(self, *args, **kwargs)
is_new = self.pk is None
super(EmailModel, self).save(*args, **kwargs)
# Create necessary objects
if is_new:
self.post_create()
def post_create(self):
# do job, send mails
pass
But I like to have some beautiful code and avoid using temporary variable in save method.
So the question is: is it possible to find if the instance of model is newly created object just after super save_base parent method call?
I've checked django sources can't find how to do that in right way.
Thanks
We have related post
For real - signals are best approch in this case.
You could use post_save() signal and in the listener just check if the credit_set exist for current model instance and if not - create one. That would be my choice - there is no need to overdo such a simple task.
Of course if you really need to know exactly when the model was initiated (I doubt it) use post_init() signal. You don't need to override save() method just to set some additional variables. Just catch post_init() signal, or pre_save(), and just change/add what you want. IMHO there is no sense to override save() method and check if this is new instance or not - that's why the signals are there.