While there is general consensus that multi-table inheritance isn't a very good idea in the long term (Jacobian, Others), am wondering if in some use cases the "extra joins" created by django during querying might be worth it.
My issue is having a Single Source of Truth in the database. Say, for Person Objects who are identified using an Identity Number and Identity Type. E.g. ID Number 222, Type Passport.
class Person(models.Model):
identity_number = models.CharField(max_length=20)
identity_type = models.IntegerField()
class Student(Person):
student_number = models.CharField(max_length=20)
class Employee(Person):
employee_number = models.CharField(max_length=20)
In abstract inheritance, any subclass model of person e.g. Student, Parent, Supervisor, Employee etc inheriting from a Person Abstract Class will have identity_number & identity_type stored in their respective tables
In multi-table inheritance, since they all share the same table, I can be sure that if I create a unique constraint on both columns in the Person Model then no duplicates will exist in the database.
In the abstract inheritance, to keep out duplicates in the database, one would have to build extra validation logic into the application thus also slightly degrading performance meaning it cancels out the "extra join" that django has to do with a concrete inheritance?
It's a mistake to think about your data modeling in object-oriented terms at all. It's an abstraction that fits poorly to relational databases, by hiding some very important details that can massively affect performance (as pointed out in the articles) or correctness (as you've pointed out above).
A traditional SQL approach to your example would offer two possibilities:
Having a Person table with the IDs and then Student, etc. with foreign keys back to it.
Having a single table for everything, with some additional fields to distinguish the different kinds of person.
Now, if your evaluation led you to prefer 1, you might notice that in Django this could be accomplished by using a concrete inheritance model (it's the same as what you describe above). In that case, by all means, use inheritance if you'd find the resulting access patterns in Django more elegant.
So I'm not saying you shouldn't use inheritance, I'm saying you should only look at it after you've modeled your data from the SQL perspective. If you did that in the example above, you would never even consider splitting everything into separate tables—which has all the problems you noted—as suggested by the abstract inheritance model.
Related
Build of this question: Which is better: Foreign Keys or Model Inheritance?
I would like to know if it is possible to replace a OneToOne field by MTI?
A.k.
I have:
class GroupUser(models.Model):
group = models.OneToOneField(Group)
...other fields....
and I want:
class GroupUser(Group):
...other fields....
I think that should be faster, or not?
Is it possible?
It won't be faster, because your parent class object will still have a field in the database that links to the child class if you are using concrete inheritance(and sounds like it would be), so technically the efficiency is the same as OneToOne field.
The choice is also based on the business logic. Inheritance is used for the situations where you have things that are of similar type, so that you could define common fields/methods in the parent class and reduce some repetitive code. From your example sounds like Group and GroupUser are totally two different things, most likely they don't share many common attributes either, so unless I misunderstand your intention, OneToOneField is a better candidate.
I have this use case scenario:
there are places which are either playgrounds, restaurants, theatres, pubs.
the same place can have playgrounds, restaurants, theatres etc.
there are couple of ways of implementing it:
use foreign keys
class Place(models.Model):
name = models.CharField(max_length=50)
class PlayGrounds(models.Model)
field1 = models.CharField(max_length=50)
place = models.ForeignKey(Place)
multitable inheritance
class Place(models.Model):
name = models.CharField(max_length=50)
address = models.CharField(max_length=80)
class Restaurant(Place):
serves_hot_dogs = models.BooleanField()
serves_pizza = models.BooleanField()
use abstract class
class Place(models.Model):
name = models.CharField(max_length=50)
class PlayGrounds(Place)
field1 = models.CharField(max_length=50)
place = models.ForeignKey(Place)
class Meta:
abstract = True
use proxy models
class Place(models.Model):
name = models.CharField(max_length=50)
class PlayGrounds(Place)
field1 = models.CharField(max_length=50)
place = models.ForeignKey(Place)
class Meta:
proxy = True
What are the pros and cons of using each approach?
The first one is essentially model inheritance, because that's what Django's implementation of MTI uses (except it's a OneToOneField instead of a ForeignKey, but that's merely a ForeignKey that's unique).
Anytime you have an is-a relationship (i.e., a Restaurant is a Place), you're dealing with inheritance, so using one of Django's model inheritance methodologies is the way to go. Each, however, has its pros and cons:
Abstract Models
Abstract models are useful when you just want to off-load repetitive fields and/or methods. They're best used as mixins, more than true "parents". For example, all of these models will have an address, so creating an abstract Address model and having each inherit from that might be a useful thing. But, a Restaurant is not an Address, per se, so this is not a true parent-child relationship.
MTI (Multiple Table Inheritance)
This is the one that's akin to your first choice above. This is most useful when you need to interact with both the parent and child classes and the children have unique fields of their own (fields, not methods). So a Restaurant might have a cuisine field, but a Place wouldn't need that. However, they both have an address, so Restaurant inherits and builds off of Place.
Proxy Models
Proxy models are like aliases. They cannot have their own fields, they only get the fields of the parent. However, they can have their own methods, so these are useful when you need to differentiate kinds of the same thing. For example, I might create proxy models like StaffUser and NormalUser from User. There's still only one user table, but I can now add unique methods to each, create two different admin views, etc.
For your scenario, proxy models don't make much sense. The children are inherently more complicated than the parent and it wouldn't make sense to store all the fields like cuisine for Restaurant on Place.
You could use an abstract Place model, but then you lose the ability to actually work Place on its own. When you want a foreign key to a generalized "place", you'll have to use generic foreign keys, instead, to be able to choose from among the different place types, and that adds a lot of overhead, if it's not necessary.
Your best bet is using normal inheritance: MTI. You can then create a foreign key to Place and add anything that is a child of Place.
It depends entirely on what sort of behaviour you need.
Do you need to perform the same kinds of operations on places and restaurants or playgrounds?
Will you be checking if your services (restaurants etc) are in the same place? Is it meaningful to treat two places with the same address as being different, and their associated services as different?
Without knowing the answers to these sorts of questions, it is impossible to say which is the most appropriate technique, as they are very different techniques, and not in general substitutes for each other.
The use of inheritance should not be dictated by an pre-conceived notion about taxonomy, because it is not there to model taxonomy: it is there to provide function polymorphism (data member inheritance is there primarily to facilitate that).
I'd vote for an abstract class if the domain dictates that a place cannot exist if it is not at least one of the others.
But if a place doesn't need to have anything on it you'd need multiple inheritance to accomodate the placemarkers (vacant lots)?
I imagine the pro's and con's revolve around your fondness of spurious database tables. How does the ORM implement these solutions? Personally i'm not fond of having lots of single field tables but ymmv.
I'd vote for the first one, because it's the most explicit. And I don't see any advatages of other methods.
I have several models inheriting from a base model.
The fields in the base model are needed rarely, but Django keeps doing complex inner joins to retrieve those fields whenever I use any of the inherited models.
How can I tell Django to avoid this ? I only need the fields in this model rarely.
Note: maybe only(..) would work(I didn't check), but I would need to add it in many places in the code..
Use abstract model inheritance.
In short, setting abstract = True in the base class' meta, makes Django using abstract inheritance, meaning each derived model will contain a copy of all the fields defined in the base model.
By the way, one of the Django's maintainers, Jacob Kaplan-Moss has quite a strong opinion against concrete inheritance,
model inheritance also offers a really
excellent opportunity to shoot
yourself in the foot: concrete
(multi-table) inheritance
and again:
I’d strongly suggest that Django users
approach any use of concrete
inheritance with a large dose of
skepticism.
Personally, I have never had to use model inheritance at all; however, after reading that blog entry, I am quite convinced in trying to avoid concrete inheritance as much as possible.
I'd say the only possiblity to avoid this is either making your base class abstract, or you create some custom sql queries that don't hit the 'base'-table...
Let's say I have an abstract base class that looks like this:
class StellarObject(BaseModel):
title = models.CharField(max_length=255)
description = models.TextField()
slug = models.SlugField(blank=True, null=True)
class Meta:
abstract = True
Now, let's say I have two actual database classes that inherit from StellarObject
class Planet(StellarObject):
type = models.CharField(max_length=50)
size = models.IntegerField(max_length=10)
class Star(StellarObject):
mass = models.IntegerField(max_length=10)
So far, so good. If I want to get Planets or Stars, all I do is this:
Thing.objects.all() #or
Thing.objects.filter() #or count(), etc...
But what if I want to get ALL StellarObjects? If I do:
StellarObject.objects.all()
It of course returns an error, because an abstract class isn't an actual database object, and therefore cannot be queried. Everything I've read says I need to do two queries, one each on Planets and Stars, and then merge them. That seems horribly inefficient. Is that the only way?
At its root, this is part of the mismatch between objects and relational databases. The ORM does a great job in abstracting out the differences, but sometimes you just come up against them anyway.
Basically, you have to choose between abstract inheritance, in which case there is no database relationship between the two classes, or multi-table inheritance, which keeps the database relationship at a cost of efficiency (an extra database join) for each query.
You can't query abstract base classes. For multi-table inheritance you can use django-model-utils and it's InheritanceManager, which extends standard QuerySet with select_subclasses() method, which does right that you need: it left-joins all inherited tables and returns appropriate type instance for each row.
Don't use an abstract base class if you need to query on the base. Use a concrete base class instead.
This is an example of polymorphism in your models (polymorph - many forms of one).
Option 1 - If there's only one place you deal with this:
For the sake of a little bit of if-else code in one or two places, just deal with it manually - it'll probably be much quicker and clearer in terms of dev/maintenance (i.e. maybe worth it unless these queries are seriously hammering your database - that's your judgement call and depends on circumstance).
Option 2 - If you do this quite a bit, or really demand elegance in your query syntax:
Luckily there's a library to deal with polymorphism in django, django-polymorphic - those docs will show you how to do this precisely. This is probably the "right answer" for querying straightforwardly as you've described, especially if you want to do model inheritance in lots of places.
Option 3 - If you want a halfway house:
This kind of has the drawbacks of both of the above, but I've used it successfully in the past to automatically do all the zipping together from multiple query sets, whilst keeping the benefits of having one query set object containing both types of models.
Check out django-querysetsequence which manages the merge of multiple query sets together.
It's not as well supported or as stable as django-polymorphic, but worth a mention nevertheless.
In this case I think there's no other way.
For optimization, you could avoid inheritance from abstract StellarObject and use it as separate table connected via FK to Star and Planet objects.
That way both of them would have ie. star.stellar_info.description.
Other way would be to add additional model for handling information and using StellarObject as through in many2many relation.
I would consider moving away from either an abstract inheritance pattern or the concrete base pattern if you're looking to tie distinct sub-class behaviors to the objects based on their respective child class.
When you query via the parent class -- which it sounds like you want to do -- Django treats the resulting ojects as objects of the parent class, so accessing child-class-level methods requires re-casting the objects into their 'proper' child class on the fly so they can see those methods... at which point a series of if statements hanging off a parent-class-level method would arguably be a cleaner approach.
If the sub-class behavior described above isn't an issue, you could consider a custom manager attached to an abstract base class sewing the models together via raw SQL.
If you're interested mainly in assigning a discrete set of identical data fields to a bunch of objects, I'd relate along a foreign-key, like bx2 suggests.
That seems horribly inefficient. Is that the only way?
As far as I know it is the only way with Django's ORM. As implemented currently abstract classes are a convenient mechanism for abstracting common attributes of classes out to super classes. The ORM does not provide a similar abstraction for querying.
You'd be better off using another mechanism for implementing hierarchy in the database. One way to do this would be to use a single table and "tag" rows using type. Or you can implement a generic foreign key to another model that holds properties (the latter doesn't sound right even to me).
class Account(models.Model):
identifier = models.CharField(max_length=5)
objects = MyCustomManager()
class Meta:
abstract = True
class Customer(Account):
name = models.CharField(max_length=255)
If I have a lot of models, and I want to save time from having to put foreignkeys everywhere, is this right? Or, am I thinking of this all wrong?
It depends in which direction the foreign keys go. You cannot have a foreign key to an abstract class.
Maybe it is Generic Relations what is interesting for you or foreign keys in abstract model classes.
Although notice that inheritance is always a is-a relationship while a normal foreign key usage implies a has-a relationship.
In your example, Customer should not inherit from Account as a customer has an account.
An inheritance example would be a Place which is either a Restaurant or a Cinema etc.
Edit after comment:
Well, there is a own section for this in the documentation:
Class inheritance and model managers aren't quite a perfect match for each other. Managers are often specific to the classes they are defined on and inheriting them in subclasses isn't necessarily a good idea. Also, because the first manager declared is the default manager, it is important to allow that to be controlled. So here's how Django handles custom managers and model inheritance:
...
Managers from abstract base classes are always inherited by the child class, using Python's normal name resolution order (names on the child class override all others; then come names on the first parent class, and so on). Abstract base classes are designed to capture information and behavior that is common to their child classes. Defining common managers is an appropriate part of this common information.
The default manager on a class is either the first manager declared on the class, if that exists, or the default manager of the first abstract base class in the parent hierarchy, if that exists. If no default manager is explicitly declared, Django's normal default manager is used.
I would only do if the inherited classes belong somehow to the same scope.
If you really a so many classes that it matters to add one line to these classes then you probably have not a good DB or application design.
And try not to put everything in one manager just to be able to use only one manager in a lot classes.
In this case you'll have 1 table for customer with account ID and if you'll add Worker he'll have his own table with account ID.
I think you probably want to have single table with accounts and attached objects customers, workers etc? This way you'll never mix-up your accounts.