Django - Polymorphic Models or one big Model? - django

I'm currently working on a model in Django involving one model that can have a variety of different traits depending on what kind of object it is. So, let's say we have a model called Mammal, which can either be an Elephant or a Dolphin (with their own traits "tusk_length" and "flipper_length" respectively).
Basic OOP principles shout "polymorphism", and I'm inclined to agree. But, as I'm new to Django, I first want to know whether or not it is the best way to do so in Django. I've heard of plenty of examples of and some people giving their preferences toward singular giant models
I've already tried using GenericForeignKeys as described here: How can I restrict Django's GenericForeignKey to a list of models?. While this solution works beautifully, I don't like the inability to filter, and that the relationship is only one way. That is, while you can get a Dolphin from a Mammal object, you can't get the Mammal object from the Dolphin.
And so, here are my two choices:
Choice A:
from django.db import models
class Mammal(models.Model):
hair_length = models.IntegerField()
tusk_length = models.IntegerField()
flipper_length = models.IntegerField()
animal_type = models.CharField(max_length = 15, choices = ["Elephant", "Dolphin"]
Choice B:
from django.db import models
class Mammal(models.Model):
hair_length = models.IntegerField()
class Elephant(Mammal):
tusk_length = models.IntegerField()
class Dolphin(Mammal):
flipper_length = models.IntegerField()
Choice B, from what I understand, has the advantage of nicer code when querying and listing all Elephants or Dolphins. However, I've noticed it's not as straightforward to get all of the Elephants from a list of Mammals (is there a query for this?) without putting animal_type in the class, with default being dependent on the class.
This leads to another problem I see with polymorphism, which won't come up in this example above or my application, but is worth mentioning is that it would be difficult to edit a Dolphin object into an Elephant without deleting the Dolphin entirely.
Overall, is there any general preference, or any big reason I shouldn't use polymorphism?

My recommendation, in general with database design, is to avoid inheritance. It complicates both the access and updates.
In Django, try using an abstract class for your base model. That means a db table will not be created for it. Its fields/columns will be auto-created in its child models. The benefit is: code reuse in Django/Python code and a simple, flat design in the database. The penalty is: it's more work to manage/query a mixed collection of child models.
See an example here: Django Patterns: Model Inheritance
Alternatively, you could change the concept of "Mammal" to "MammalTraits." And include a MammalTraits object inside each specific mammal class. In code, that is composition (has-a). In the db, that will be expressed as a foreign key.

We ended up going with a large table with a lot of usually-empty columns. Our reasoning was that (in this case) our Mammal table was all we'd be querying over, and there was no (intuitive) way to filter out by certain types of Mammals besides manually checking whether they had a "dolphin" or "elephant" object, which then threw an error if they didn't. Even looking for the type of an object returned from a query that was definitely an Elephant still returned "Mammal". It would be hard to extend any Pythonic workarounds to writing pure SQL, which one of our data guys does regularly.

Related

Is there a Django ManyToManyField with implied ownership?

Let's imagine I'm building a Django site "CartoonWiki" which allows users to write wiki articles (represented by the WikiArticle-model) as well as posting in a forum (represented by the ForumPost-model). Over time more features will be added to the site.
A WikiArticle has a number of FileUploads which should be deleted when the WikiArticle is deleted. By "deleted" I mean Django's .delete()-method.
However, the FileUpload-model is generic -- it's not specific to WikiArticle -- and contains generic file upload logic that e.g. removes the file from S3 when it's removed from the database. Other models like ForumPost will use the FileUpload-model as well.
I don't want to use GenericForeignKey nor multi-table inheritance for the reasons Luke Plant states in the blog post Avoid Django's GenericForeignKey. (However, if you can convince me that there really is no better way than the trade-offs GenericForeignKey make, I might be swayed and accept a convincing answer of that sort.)
Now, the most trivial way to do this is to have:
class FileUpload(models.Model):
article = models.ForeignKey('WikiArticle', null=True, blank=True, on_delete=models.CASCADE)
post = models.ForeignKey('ForumPost', null=True, blank=True, on_delete=models.CASCADE)
But that will have the FileUpload-model expand indefinitely with more fields -- and similar its underlying table will gain more and more columns as new models in the system start using FileUpload. This feels suboptimal both in terms of data-modeling, but also in terms of separation-of-concerns -- the FileUpload-model and table is being changed while no actual new functionality is being added to it.
My preference would really be to go the other way around:
class WikiArticle(models.Model):
uploads = models.ManyToManyField('FileUpload')
But this doesn't solve the deletion issue: If I .delete() a WikiArticle the corresponding FileUploads won't be deleted. I've tried various setups with through-models, but none seem to solve it. What I really need is a OneToMany-field -- a sort of reverse ForeignKey to indicate the ownership in the right direction without polluting the generic/reusable model.
Should FileUpload really instead be a field? Or perhaps an abstract model? (WikiArticleFileUpload, ForumPostFileUpload, and so on...).
I realize that a true ManyToManyField with implied ownership would no longer really be a ManyToManyField since the field implies sharing. E.g. a FileUpload could technically be referenced by multiple WikiArticles, so you could be removing FileUploads from other objects rather on top of the one you're deleting. The question still stands though -- it seems I need a OneToManyField to model this in a nice way.
You probably have a couple of options to solve your problem, but it also requires on the exact requirements of your application.
Using a GenericForeignKey in this situation is probably fine, escpecially due to the fact that you do not know how many other models will use your upload model. Of course as mentioned in the linked blog post eg. doing plain SQL queries might be harder but it's on you to decide if that's a problem for your use case.
Also using inheritance might be an option, so that all the referenced models inherit the relation to the upload model from a common ancestor. This might have a small impact performance-wise because you Django would need to join the tables of the models but the impact might still be not that big. On the other hand this approach might also have some advantages if eg. your articles and posts have other stuff in common as well and you could easily do stuff like "show all new posts and articles (together)".
If you handle deletion yourself as mentioned in the previous answer you can also add ManyToMany fields yourself but also consider that this method also has some disadvantages in common with using generic foreign keys (eg. a lot of stuff to join in the database...)
Probably it's fine that you just use a GenericForeignKey, especially if the number of models that use your "generic" model gets bigger (eg. more than 3-5). All in all this sounds pretty much like a use case GenericForeignKey was made for (imagine the uploads being something like "tags" belonging to the posts).
ManyToMany fields are symmetrical, even though you define them on one model with an (explicit or implicit) related_name on the other.
I can think of two methods to clean up while, or after, WikiArticles are deleted. The first is to periodically search for and delete "orphan" FileUploads. At its simplest, (assuming a related_name of articles)
deleted = FileUpload.objects.filter( articles__isnull=True).delete()
The other is to explicitly process the related articles during deleting of the article. It's straightforward to subclass the object's delete method, but this is not the only way to delete an object (bulk_delete, for example, bypasses this). Anyway,
def delete( self, *args, **kwargs):
article_pks = self.uploads.all().values_list('pk', Flat=True)
response = super().delete( *args, **kwargs)
FileUpload.objects.filter(
pk__in = article_pks, articles__isnull=True) .delete()
return response
(or even just execute the "periodically" code above, for every article-deletion, which will also tity after any deleted though othr channels)
Please thoroughly test this if you use it. Delete operations which don't do precisely what is wanted are the scariest sorts of bug!

django multi-table inheritance, access method from child from instance of parent

I am using Multi-Table inheritance (aka Concrete Inheritance), where I have a non-abstract model + DB Table called Clients, which is concerned with common details concerning all the clients.
But a client can be an Individual, Partnership or Company, for which I have created inheriting models and tables. An Individual has first name + last name, and company has other specific particulars, etc.
I want to be able to access the names of clients (derived from the columns from the child tables) when I want a list of all clients.
After lot of searching, I found that this tutorial, works successfully.
Basically, it involves inserting a column on the Client table, which will store the name of the Child model. Then using that name, the appropriate child model is identified and appropriate child method is accessed.
But it seems to be a slightly cumbersome way to implement polymorphism in Multi-Table inheritance.
I want to know whether since 2012, Django has introduced any better way to deal with the issue, or is this still the only way?
Please let me know if my code sample is required, but the link provided has a beautiful example already.
There is django-model-utils application with Inheritance Manager. It will cast automatically your parent class to children instances. Example from docs:
from model_utils.managers import InheritanceManager
class Place(models.Model):
# ...
objects = InheritanceManager()
class Restaurant(Place):
# ...
class Bar(Place):
# ...
nearby_places = Place.objects.filter(location='here').select_subclasses()
for place in nearby_places:
# "place" will automatically be an instance of Place, Restaurant, or Bar
Also check this question for generic solution with ContentType.
And also check awesome article by Jeff Elmore about this topic. Quite old, but still great.

Django: using ContentType vs multi_table_inheritance

I was having a similar problem as in
How to query abstract-class-based objects in Django?
The thread suggests using multi_table_inheritance. I personally think using content_type more conceptually comfortable (just feels more close to logic, at least to me)
Using the example in the previous link, I would just add a StelarType as
class StellarType(models.Model):
"""
Use ContentType so we have a single access to all types
"""
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey('content_type', 'object_id')
Then add this to the abstract base model
class StellarObject(BaseModel):
title = models.CharField(max_length=255)
description = models.TextField()
slug = models.SlugField(blank=True, null=True)
stellartype = generic.GenericForeignKey(StellarType)
class Meta:
abstract = True
To sync between StellarObject and StellarType, we can connect post_save signal to create a StellarType instance every time a Planet or Star is created. In this way, I can query StellarObjects through StellarType.
So I'd like to know what's the PRO and CON of using this approach against using multi_table_inheritance? I think both create an additional table in the databse. But how about database performance? how about usability/flexibility? Thanks for any of your input!
To me, ContentType is the way to go when you want to relate an object to one of many models that aren't fundamentally of the same "type". Like if you want to be able to key Comments to Users, Pages, and Pictures on a social network, but there's no reasonable supertype shared by those three models. Sure you could create a "Commentable" supertype, but to me that feels more like a mixin than a fundamental type from which those three things derive. Before ContentType came out, you would have had no choice but to invent supertypes for these kind of relations, which can get really ugly really quickly if you need to do it multiple times in the same application (lets say you also have Events, Alerts, Messages, etc., each of which can apply to a different set of models).
Multi-table inheritance makes the most sense when you want to attach attributes to the base model, such that they will be shared in all concrete models that extend from it, so that you can get polymorphic behavior. Commentable doesn't really fit this mold, because all of that behavior can be put on the Comment model, less so on the Commentable objects. But if you have different classes of Users that share much of the same behavior and should be aggregable, then it makes a lot more sense.
The major pro of multi-table inheritance to me is a cleaner data model, with implicit relationships and inheritance that can be taken advantage of on the Python side (polymorphism is still a bit messy though, as seen here and here). The major pro of ContentType is that it is more general and keeps auxiliary functionality out of your models, at the cost of a bit of a slightly less pristine schema (lots of "meta" fields on your models to define these relationships). And for your example, you still have to rely on post_save, which seems unnecessarily messy/magical to me, as well.
Sorry for reviving old thread. I think it all boils down to the lookup direction. Whether you look up all subclasses for a certain FK (multitable inheritance) or define the referenced class as a content type and look it up based on the table reference and id (contenttypes) makes no big difference in performance - hint: they both suck. I think content types is a nice choice if you want your app to be easily extendible, i.e. others can add new content types to reference against. Multitable is good if you only sometimes need the extra columns defined in extra tables. Sometimes it might also be a good idea to merge all your subtypes and make only one which has a few fields left empty most of the time.

Django Nonrel - Working around multi-table inheritance with noSQL?

I'm working on a django-nonrel project running on Google's AppEngine. I want to create a model for a Game which contains details which are generally common to all sports - i.e. gametime, status, location, etc. I've then modelled specific classes for GameBasketball, GameBaseball etc, and these inherit from the base class.
This creates a problem however if I want to retrieve something like all the Games on a certain day:
Game.objects.filter(gametime=mydate)
This will return an error:
DatabaseError: Multi-table inheritance is not supported by non-relational DBs.
I understand that AppEngine doesn't support JOINs and so it makes sense that this fails. But I'm not sure how to properly tackle this problem in a non-relational environment. One solution I've tried is to turn Game into an abstract base class, and while that allows me to model the data in a nice way - it still doesn't resolve the use case above since its not possible to get objects for an abstract base class.
Is the only solution here to put all the data for all possible sports (and just leave fields that aren't relevant to a specific sport null) in the Game model, or is there a more elegant way to solve this problem?
EDIT:
I'm more interested in understanding the correct way of handling this type of issue in any noSQL setup, and not specifically on AppEngine. So feel free to reply even if your answer isn't GAE specific!
For anyone else who has this issue in the future, here is how I eventually went about resolving it. Keep in mind that this is somewhat of a mongo-specific solution, and AFAIK there is no easy way to solve this just using base nonrel classes.
For Mongo I able to solve this by using Embedded Documents (other noSQL setups may have similar soft-schema type features), simplified code below:
class Game(models.Model):
gametime = models.DateTimeField()
# etc
details = EmbeddedModelField() # This is the mongo specific bit.
class GameBasketballDetails(models.Model):
home = models.ForeignKey(Team)
# etc
Now when I instantiate the Game class I save a GameBasketballDetails object in the details field. So I can now do:
games = Game.objects.filter(gametime=mydate)
games[0].details.basketball_specific_method()
This is an acceptable solution since it allows me to query on the Game model but still get the "child" specific info I need!
The reason multi-table inheritance is disallowed in django-nonrel is because the API that Django provides for those sort of models will often use JOIN queries, as you know.
But I think you can just manually set up the same thing Django would have done with it's model inheritance 'sugar' and just avoid doing the joins in your own code.
eg
class Game(models.Model):
gametime = models.DateTimeField()
# etc
class GameBasketball(models.Model):
game = models.OneToOneField(Game)
basketball_specific_field = models.TextField()
you'll need a bit of extra work when you create a new GameBasketball to also create the corresponding Game instance (you could try a custom manager class) but after that you can at least do what you want, eg
qs = Game.objects.filter(gametime=mydate)
qs[0].gamebasketball.basketball_specific_field
django-nonrel and djangoappengine have a new home on GitHub: https://github.com/django-nonrel/
I'm not convinced, next to the speed of GAE datastore API itself, that the choice of python framework makes much difference, or that django-nonrel is inherently slower than webapp2.

How do I implement a common interface for Django related object sets?

Here's the deal:
I got two db models, let's say ShoppingCart and Order. Following the DRY principle I'd like to extract some common props/methods into a shared interface ItemContainer.
Everything went fine till I came across the _flush() method which mainly performs a delete on a related object set.
class Order(models.Model, interface.ItemContainer):
# ...
def _flush(self):
# ...
self.orderitem_set.all().delete()
So the question is: how do I dynamically know wheter it is orderitem_set or shoppingcartitem_set?
First, here are two Django snippets that should be exactly what you're looking for:
Model inheritance with content type and inheritance-aware manager
ParentModel and ChildManager for Model Inheritance
Second, you might want to re-think your design and switch to the django.contrib content types framework which has a simple .model_class() method. (The first snippet posted above also uses the content type framework).
Third, you probably don't want to use multiple inheritance in your model class. It shouldn't be needed and I wouldn't be surprised if there were some obscure side affects. Just have interface.ItemContainer inherit from models.Model and then Order inherit from only interface.ItemContainer.
You can set the related_name argument of a ForeignKey, so if you want to make minimal changes to your design, you could just have ShoppingCartItem and OrderItem set the same related_name on their ForeignKeys to ShoppingCart and Order, respectively (something like "item_set"):
order = models.ForeignKey(Order, related_name='item_set')
and
cart = models.ForeignKey(ShoppingCart, related_name='item_set')