I've run into an issue that I really haven't dealt with before. I have a task to upgrade from django 1 ==> 2. (django 1 doesn't require on_delete when dealing with relationships)
I have a couple of crucial models that have relationships inside, but I definitely don't want to CASCADE those records. For example, if a user deletes their account, I don't want their expenses to be deleted. Maybe we need to keep those expense instances for tax records later, etc.
I have read that DO_NOTHING can also be dangerous.
With a model like this, what would be the best course of action when dealing with the ForeignKeys?
I appreciate all the help in advance.
class Expenses(models.Model):
user = models.ForeignKey(Account, null=True, blank=True,
on_delete=models.?)
event = models.ForeignKey(Event, null=True, blank=True,
on_delete=models.?)
payee = models.CharField(max_length=128, null=True, blank=True)
category = models.ForeignKey(Category, blank=True, null=True,
related_name='expense_category', on_delete=models.?)
I have a task to upgrade from django 1 to 2. (django 1 doesn't require on_delete when dealing with relationships)
In django-1.x, if you did not specify on_delete, it used CASCADE [Django-doc], so in fact by specifying it, you can make it more safe.
I have read that DO_NOTHING can also be dangerous.
Well most databases will raise an integrity error for this, since then it would refer to a user that no longer exists. So DO_NOTHING is not in itself dangerous, it will simply for most databases not allow deleting the database, but that by rasing an IntegrityError.
With a model like this, what would be the best course of action when dealing with the ForeignKeys?
Perhaps here PROTECT [Django-doc] is here more appropriate, since it will simply prevent deleting the object if it is still referenced.
The best solution however depends on a large number of details. Therefore it might be better to look at the possible on_delete=… strategies [Django-doc].
I'm quite familiar with Django, but I recently noticed there exists an on_delete=models.CASCADE option with the models. I have searched for the documentation for the same, but I couldn't find anything more than:
Changed in Django 1.9:
on_delete can now be used as the second positional argument (previously it was typically only passed as a keyword argument). It will be a required argument in Django 2.0.
An example case of usage is:
from django.db import models
class Car(models.Model):
manufacturer = models.ForeignKey(
'Manufacturer',
on_delete=models.CASCADE,
)
# ...
class Manufacturer(models.Model):
# ...
pass
What does on_delete do? (I guess the actions to be done if the model is deleted.)
What does models.CASCADE do? (any hints in documentation)
What other options are available (if my guess is correct)?
Where does the documentation for this reside?
This is the behaviour to adopt when the referenced object is deleted. It is not specific to Django; this is an SQL standard. Although Django has its own implementation on top of SQL. (1)
There are seven possible actions to take when such event occurs:
CASCADE: When the referenced object is deleted, also delete the objects that have references to it (when you remove a blog post for instance, you might want to delete comments as well). SQL equivalent: CASCADE.
PROTECT: Forbid the deletion of the referenced object. To delete it you will have to delete all objects that reference it manually. SQL equivalent: RESTRICT.
RESTRICT: (introduced in Django 3.1) Similar behavior as PROTECT that matches SQL's RESTRICT more accurately. (See django documentation example)
SET_NULL: Set the reference to NULL (requires the field to be nullable). For instance, when you delete a User, you might want to keep the comments he posted on blog posts, but say it was posted by an anonymous (or deleted) user. SQL equivalent: SET NULL.
SET_DEFAULT: Set the default value. SQL equivalent: SET DEFAULT.
SET(...): Set a given value. This one is not part of the SQL standard and is entirely handled by Django.
DO_NOTHING: Probably a very bad idea since this would create integrity issues in your database (referencing an object that actually doesn't exist). SQL equivalent: NO ACTION. (2)
Source: Django documentation
See also the documentation of PostgreSQL for instance.
In most cases, CASCADE is the expected behaviour, but for every ForeignKey, you should always ask yourself what is the expected behaviour in this situation. PROTECT and SET_NULL are often useful. Setting CASCADE where it should not, can potentially delete all of your database in cascade, by simply deleting a single user.
Additional note to clarify cascade direction
It's funny to notice that the direction of the CASCADE action is not clear to many people. Actually, it's funny to notice that only the CASCADE action is not clear. I understand the cascade behavior might be confusing, however you must think that it is the same direction as any other action. Thus, if you feel that CASCADE direction is not clear to you, it actually means that on_delete behavior is not clear to you.
In your database, a foreign key is basically represented by an integer field which value is the primary key of the foreign object. Let's say you have an entry comment_A, which has a foreign key to an entry article_B. If you delete the entry comment_A, everything is fine. article_B used to live without comment_A and don't bother if it's deleted. However, if you delete article_B, then comment_A panics! It never lived without article_B and needs it, it's part of its attributes (article=article_B, but what is article_B???). This is where on_delete steps in, to determine how to resolve this integrity error, either by saying:
"No! Please! Don't! I can't live without you!" (which is said PROTECT or RESTRICT in Django/SQL)
"All right, if I'm not yours, then I'm nobody's" (which is said SET_NULL)
"Good bye world, I can't live without article_B" and commit suicide (this is the CASCADE behavior).
"It's OK, I've got spare lover, I'll reference article_C from now" (SET_DEFAULT, or even SET(...)).
"I can't face reality, I'll keep calling your name even if that's the only thing left to me!" (DO_NOTHING)
I hope it makes cascade direction clearer. :)
Footnotes
(1) Django has its own implementation on top of SQL. And, as mentioned by #JoeMjr2 in the comments below, Django will not create the SQL constraints. If you want the constraints to be ensured by your database (for instance, if your database is used by another application, or if you hang in the database console from time to time), you might want to set the related constraints manually yourself. There is an open ticket to add support for database-level on delete constraints in Django.
(2) Actually, there is one case where DO_NOTHING can be useful: If you want to skip Django's implementation and implement the constraint yourself at the database-level.
The on_delete method is used to tell Django what to do with model instances that depend on the model instance you delete. (e.g. a ForeignKey relationship). The on_delete=models.CASCADE tells Django to cascade the deleting effect i.e. continue deleting the dependent models as well.
Here's a more concrete example. Assume you have an Author model that is a ForeignKey in a Book model. Now, if you delete an instance of the Author model, Django would not know what to do with instances of the Book model that depend on that instance of Author model. The on_delete method tells Django what to do in that case. Setting on_delete=models.CASCADE will instruct Django to cascade the deleting effect i.e. delete all the Book model instances that depend on the Author model instance you deleted.
Note: on_delete will become a required argument in Django 2.0. In older versions it defaults to CASCADE.
Here's the entire official documentation.
FYI, the on_delete parameter in models is backwards from what it sounds like. You put on_delete on a foreign key (FK) on a model to tell Django what to do if the FK entry that you are pointing to on your record is deleted. The options our shop have used the most are PROTECT, CASCADE, and SET_NULL. Here are the basic rules I have figured out:
Use PROTECT when your FK is pointing to a look-up table that really shouldn't be changing and that certainly should not cause your table to change. If anyone tries to delete an entry on that look-up table, PROTECT prevents them from deleting it if it is tied to any records. It also prevents Django from deleting your record just because it deleted an entry on a look-up table. This last part is critical. If someone were to delete the gender "Female" from my Gender table, I CERTAINLY would NOT want that to instantly delete any and all people I had in my Person table who had that gender.
Use CASCADE when your FK is pointing to a "parent" record. So, if a Person can have many PersonEthnicity entries (he/she can be American Indian, Black, and White), and that Person is deleted, I really would want any "child" PersonEthnicity entries to be deleted. They are irrelevant without the Person.
Use SET_NULL when you do want people to be allowed to delete an entry on a look-up table, but you still want to preserve your record. For example, if a Person can have a HighSchool, but it doesn't really matter to me if that high-school goes away on my look-up table, I would say on_delete=SET_NULL. This would leave my Person record out there; it just would just set the high-school FK on my Person to null. Obviously, you will have to allow null=True on that FK.
Here is an example of a model that does all three things:
class PurchPurchaseAccount(models.Model):
id = models.AutoField(primary_key=True)
purchase = models.ForeignKey(PurchPurchase, null=True, db_column='purchase', blank=True, on_delete=models.CASCADE) # If "parent" rec gone, delete "child" rec!!!
paid_from_acct = models.ForeignKey(PurchPaidFromAcct, null=True, db_column='paid_from_acct', blank=True, on_delete=models.PROTECT) # Disallow lookup deletion & do not delete this rec.
_updated = models.DateTimeField()
_updatedby = models.ForeignKey(Person, null=True, db_column='_updatedby', blank=True, related_name='acctupdated_by', on_delete=models.SET_NULL) # Person records shouldn't be deleted, but if they are, preserve this PurchPurchaseAccount entry, and just set this person to null.
def __unicode__(self):
return str(self.paid_from_acct.display)
class Meta:
db_table = u'purch_purchase_account'
As a last tidbit, did you know that if you don't specify on_delete (or didn't), the default behavior is CASCADE? This means that if someone deleted a gender entry on your Gender table, any Person records with that gender were also deleted!
I would say, "If in doubt, set on_delete=models.PROTECT." Then go test your application. You will quickly figure out which FKs should be labeled the other values without endangering any of your data.
Also, it is worth noting that on_delete=CASCADE is actually not added to any of your migrations, if that is the behavior you are selecting. I guess this is because it is the default, so putting on_delete=CASCADE is the same thing as putting nothing.
As mentioned earlier, CASCADE will delete the record that has a foreign key and references another object that was deleted. So for example if you have a real estate website and have a Property that references a City
class City(models.Model):
# define model fields for a city
class Property(models.Model):
city = models.ForeignKey(City, on_delete = models.CASCADE)
# define model fields for a property
and now when the City is deleted from the database, all associated Properties (eg. real estate located in that city) will also be deleted from the database
Now I also want to mention the merit of other options, such as SET_NULL or SET_DEFAULT or even DO_NOTHING. Basically, from the administration perspective, you want to "delete" those records. But you don't really want them to disappear. For many reasons. Someone might have deleted it accidentally, or for auditing and monitoring. And plain reporting. So it can be a way to "disconnect" the property from a City. Again, it will depend on how your application is written.
For example, some applications have a field "deleted" which is 0 or 1. And all their searches and list views etc, anything that can appear in reports or anywhere the user can access it from the front end, exclude anything that is deleted == 1. However, if you create a custom report or a custom query to pull down a list of records that were deleted and even more so to see when it was last modified (another field) and by whom (i.e. who deleted it and when)..that is very advantageous from the executive standpoint.
And don't forget that you can revert accidental deletions as simple as deleted = 0 for those records.
My point is, if there is a functionality, there is always a reason behind it. Not always a good reason. But a reason. And often a good one too.
Using CASCADE means actually telling Django to delete the referenced record.
In the poll app example below: When a 'Question' gets deleted it will also delete the Choices this Question has.
e.g Question: How did you hear about us?
(Choices: 1. Friends 2. TV Ad 3. Search Engine 4. Email Promotion)
When you delete this question, it will also delete all these four choices from the table.
Note that which direction it flows.
You don't have to put on_delete=models.CASCADE in Question Model put it in the Choice.
from django.db import models
class Question(models.Model):
question_text = models.CharField(max_length=200)
pub_date = models.dateTimeField('date_published')
class Choice(models.Model):
question = models.ForeignKey(Question, on_delete=models.CASCADE)
choice_text = models.CharField(max_legth=200)
votes = models.IntegerField(default=0)
simply put, on_delete is an instruction to specify what modifications will be made to the object in case the foreign object is deleted:
CASCADE: will remove the child object when the foreign object is deleted
SET_NULL: will set the child object foreign key to null
SET_DEFAULT: will set the child object to the default data given while creating the model
RESTRICT: raises a RestrictedError under certain conditions.
PROTECT: prevents the foreign object from being deleted so long there are child objects inheriting from it
additional links:
https://docs.djangoproject.com/en/4.0/ref/models/fields/#foreignkey
Here is answer for your question that says: why we use on_delete?
When an object referenced by a ForeignKey is deleted, Django by default emulates the behavior of the SQL constraint ON DELETE CASCADE and also deletes the object containing the ForeignKey. This behavior can be overridden by specifying the on_delete argument. For example, if you have a nullable ForeignKey and you want it to be set null when the referenced object is deleted:
user = models.ForeignKey(User, blank=True, null=True, on_delete=models.SET_NULL)
The possible values for on_delete are found in django.db.models:
CASCADE: Cascade deletes; the default.
PROTECT: Prevent deletion of the referenced object by raising ProtectedError, a subclass of django.db.IntegrityError.
SET_NULL: Set the ForeignKey null; this is only possible if null is True.
SET_DEFAULT: Set the ForeignKey to its default value; a default for the ForeignKey must be set.
Let's say you have two models, one named Person and another one named Companies, and that, by definition, one person can create more than one company.
Considering a company can have one and only one person, we want that when a person is deleted that all the companies associated with that person also be deleted.
So, we start by creating a Person model, like this
class Person(models.Model):
id = models.IntegerField(primary_key=True)
name = models.CharField(max_length=20)
def __str__(self):
return self.id+self.name
Then, the Companies model can look like this
class Companies(models.Model):
title = models.CharField(max_length=20)
description=models.CharField(max_length=10)
person= models.ForeignKey(Person,related_name='persons',on_delete=models.CASCADE)
Notice the usage of on_delete=models.CASCADE in the model Companies. That is to delete all companies when the person that owns it (instance of class Person) is deleted.
Reorient your mental model of the functionality of "CASCADE" by thinking of adding a FK to an already existing cascade (i.e. a waterfall). The source of this waterfall is a primary key (PK). Deletes flow down.
So if you define a FK's on_delete as "CASCADE," you're adding this FK's record to a cascade of deletes originating from the PK. The FK's record may participate in this cascade or not ("SET_NULL"). In fact, a record with a FK may even prevent the flow of the deletes! Build a dam with "PROTECT."
Deletes all child fields in the database when parent object is deleted then we use on_delete as so:
class user(models.Model):
commodities = models.ForeignKey(commodity, on_delete=models.CASCADE)
CASCADE will also delete the corresponding field connected with it.
When deleting an object in admin interface, I want to prevent removal of related objects.
class ObjectToDelete(models.Model):
timestamp = models.DateTimeField()
class RelatedObject(models.Model):
otd = models.ForeignKey('app.ObjectToDelete', null=True, blank=True)
Since the ForeignKey in RelatedObject is nullable, I should be able to set it to None instead of deleting the whole object. And this is exactly the behaviour I want to have.
I know that I can create custom delete actions for this admin interface.
And I am also aware that I could make ManyToManyField in ObjectToDelete which would also prevent removal of RelatedObject. But then I wouldn't have the one-to-many relation which I want.
Is there a simple way of achieving this?
Set the on_delete option for your foreign key. If you you want to set the value to None when the related object is deleted, use SET_NULL:
models.ForeignKey('app.ObjectToDelete', on_delete=models.SET_NULL)
These rules apply however you delete an object, whether you do it in the admin panel or working directly with the Model instance. (But it won't take effect if you work directly with the underlying database in SQL.)
My desire is to have a common location model, and then have the various higher level models who need a location refer to it.
I want to present my user in admin with a multiple part form (an inline) that allows them to enter the higher level info for the Publisher and Building, as well as the location information for each. The inline system doesn't seem to want to work this way.
Clearly, I am doing something very wrong, because this seems like a very standard sort of problem to me. Is my schema design borked ?
Am I stupidly using the inline system ? I don't want to do subclasses of Location for each upper level object, because I want to manipulate locations in different ways independent of whatever high-level objects own them (a mailing list, or geographic look up perhaps)
models.py:
...
class Location(models.Model):
"""
A geographical address
"""
# Standard Location stuff
address_line1 = models.CharField("Address line 1", max_length = 45, null=True, blank=True)
...
class Publisher(models.Model):
"""
Contains Publisher information for publishers of yearbooks. Replaces Institution from 1.x
"""
name = models.CharField(max_length=100, null=False, help_text="Name of publisher, e.g. University of Kansas")
groups = models.ManyToManyField(Group, help_text="Select groups that this publisher owns. Usually just one, but multiple groups are possible.")
is_active = models.BooleanField(help_text="Check this box to enable this publisher.")
location = models.OneToOneField(Location)
...
class Building(models.Model):
"""
Contains Building Information
"""
name = models.CharField(max_length=100, null=False, help_text="Name of building, e.g. Physical Sciences")
is_active = models.BooleanField(help_text="Check this box to enable this building.")
location = models.OneToOneField(Location)
...
admin.py:
...
class LocationInline(generic.GenericStackedInline):
model = Location
max_num = 1
extra = 1
class PublisherAdmin(admin.ModelAdmin):
model = Publisher
inlines = [ LocationInline,
]
class BuildingAdmin(admin.ModelAdmin):
model = Building
inlines = [ LocationInline,
]
admin.site.register(Publisher, PublisherAdmin)
admin.site.register(Building, BuildingAdmin)
I can force the inline to load and present by adding this to the Location model:
# Support reverse lookup for admin
object_id = models.PositiveIntegerField()
content_type = models.ForeignKey(ContentType)
of = generic.GenericForeignKey('content_type', 'object_id' )
But when I do this, even though I do get an inline object, and can edit it, the relationship seems backwards to me, with Location storing an id to the object that created it.
Any help is welcome, either a recommended schema change to make everything work wonderfully (as Django is so good at) or a trick to make the backwards-seeming stuff make sense.
Firstly, I think you want ForeignKey, not OneToOneField. Otherwise, you might as well just add your location fields to the Publisher and Building models. Then you'll simply get a dropdown to choose the location and a link to add a new one if needed in the building and publisher admin.
If you really want to have one location instance per building/publisher, you won't be able to edit it as an inline because an inline model needs to have a ForeignKey pointing to the parent model, unless you add the generic foreign key. This isnt 'backwards' - it's a valid option when you want an object to be able to attach itself to any other, regardless of type.
When it comes to domain model, there's no such thing as a "One Right Way" to do it, it depends on your specific application's requirements.
wrt/ your problem:
The OneToOne field limits your models to one Location per model instance, which (as Greg mentionned) is not conceptually very different from just sticking the Location's fields directly in the model. wrt/ DRY/factorisation/reuse etc, you can get this done using model inheritence too, having an abstract (or eventually concrete if it makes sense for your app) Location model.
The ForeignKey solution still restricts your Publisher and Building models to a single Location (which might - or not - be what you want), but a given location might be shared between different Publisher and / or Building instances. This means that editing one given location will reflect on all the related instances (beware of unwanted side effects here).
Using a GenericForeignKey in the Location model means that a given location instance belongs to one and only one related object. No surprinsing side-effect as with the above solution but you may have duplicate locations (ie one for the building, one for the publisher) with same values, and you won't be able to lookup all related objects for a specific location (or not that easily at least). Also, this won't prevent a Publisher or Building instance to have more than one location, which once again might be fine or not. wrt/ Location instance "storing the id" of the object they belong to, well, that's really what this design choice means : a Location "belongs to" some other object, period.
In any case, designing around the default behaviour of Django's admin app is probably not the wisest thing to do. You have to first decide what makes sense for this application (and you may have different needs for Publishers and Buildings), then possibly extend the admin to match your needs.
django guardian https://github.com/lukaszb/django-guardian is a really well written object-level permissions app; and I have actually read up on and used quite a number of other django object level permissions app in various django projects.
In a recent project that I am working on, I decided to use django guardian but I have a model design question relating to the pros and cons of two possible approaches and their respective implications on sql query performance:-
using django.contrib.auth.models.Group and extending that to my custom organization app's models; or
using django.contrib.auth.models.User instead and creating an m2m field for each of the organization type in my organization app.
Approach #1
# Organisation app's models.py
from django.contrib.auth.models import Group
class StudentClass(models.Model):
name = models.CharField('Class Name', max_length=255)
groups = models.ManyToManyField(Group, blank=True)
size = models.IntegerField('Class Size', blank=True)
class SpecialInterestGroup(models.Model):
name = models.CharField('Interest Group Name', max_length=255)
groups = models.ManyToManyField(Group, blank=True)
description = models.TextField('What our group does!', blank=True)
class TeachingTeam(models.Model):
name = models.CharField('Teacher Team Name', max_length=255)
groups = models.ManyToManyField(Group, blank=True)
specialization = models.TextField('Specialty subject matter', blank=True)
In this approach, when a user is added to a group (django group) for the first time, the group object is created and also assigned to one of these 3 classes, if that group object does not yet belong to the class it is added into.
This means that each StudentClass object, sc_A, sc_B etc, can possibly contain a lot of groups.
What that means is that for me to ascertain whether or not a specific user (say myuser) belongs to a particular organization, I have to query for all the groups that the user belong to, via groups_myuser_belongto = myuser.groups and then query for all the groups that are associated to the organization I am interested in, via groups_studentclass = sc_A.groups.all() and since I now have 2 lists that I need to compare, I can do set(groups_myuser_belongto) && set(groups_studentclass), which will return a new set which may contain 1 or more groups that intersect. If there are 1 or more groups, myuser is indeed a member of sc_A.
This model design therefore implies that I have to go through a lot of trouble (and extra queries) just to find out if a user belongs to an organization.
And the reason why I am using m2m to groups is so as to make use of the Group level permissions functionality that django guardian provides for.
Is such a model design practical?
Or am I better off going with a different model design like that...
Approach #2
# Organisation app's models.py
from django.contrib.auth.models import User
class StudentClass(models.Model):
name = models.CharField('Class Name', max_length=255)
users = models.ManyToManyField(User, blank=True)
size = models.IntegerField('Class Size', blank=True)
class SpecialInterestGroup(models.Model):
name = models.CharField('Interest Group Name', max_length=255)
users = models.ManyToManyField(User, blank=True)
description = models.TextField('What our group does!', blank=True)
class TeachingTeam(models.Model):
name = models.CharField('Teacher Team Name', max_length=255)
users = models.ManyToManyField(User, blank=True)
specialization = models.TextField('Specialty subject matter', blank=True)
Obviously, this model design makes it really easy for me to check if a user object belongs to a particular organization or not. All I need to do to find out if user john is part of a TeachingTeam maths_teachers or not is to check:
user = User.objects.get(username='john')
maths_teachers = TeachingTeam.objects.get(name='Maths teachers')
if user in maths_teachers.users.all():
print "Yes, this user is in the Maths teachers organization!"
But what this model design implies is that when I add a user object to a group (recall that I want to use django guardian's Group permissions functionality), I have to make sure that the save() call adds the user object into a "Maths Teachers" group in django.contrib.auth.models.Group AND into my custom TeachingTeam class's "Maths Teachers" object. And that doesn't feel very DRY, not to mention that I have to somehow ensure that the save calls into both the models are done in a single transaction.
Is there a better way to design my models given this use case/requirement - use django groups and yet provide a way to "extend" the django's native group functionality (almost like how we extend django's user model with a "user profile app")?
My take on this (having developed django apps for a long time) is that you should stick with the natural approach (so a StudentClass has Users rather than Groups). Here "natural" means that it correspond to the actual semantics of the involved objects.
If belonging to a specific StudentClass must imply some automatic group (in addition to those granted to the user), add a groups m2m to the StudentClass model, and create a new authentication backend (extending the default one), which provides a custom get_all_permissions(self, user_obj, obj=None) method. It will be hooked by https://github.com/django/django/blob/master/django/contrib/auth/models.py#L201
In this method query for any group associated to any Organization the user belongs to. And you don't need to do 1+N queries, correct use of the ORM will navigate through two *-to-many at once.
The current ModelBackend method in https://github.com/django/django/blob/master/django/contrib/auth/backends.py#L37 queries get_group_permissions(user_obj) and adds them to the perms the user has assigned. You could add similar behavior by adding (cached) get_student_class_permission and other corresponding methods.
(edited for clearer prologue)
Obs: There is another approach which is to use generic relationships, in this approach you would have the User model instance pointing to it's resources through the contenttypes framework. There is a nice question here on SE explaining this approach.
About the performance: There are some clues that a single select with JOIN is cheaper than many simple select (1,2,3). In this case opition 2 would be better.
About the usability: The first approach is hard to explain, hard to understand. IMHO go for no. 2. Or try the generic relations.