Django, multi-table inheritance is that bad? - django

This isn't really specific to django.
One can model
Place (with location, name, and other common attributes)
- Restaurant (menu..)
- ConcertHall (hall size..)
in two separate tables and let each one hold all the fields they need. (in django world, this is called abstract inheritance)
in three tables, where one holds the common fields and the other two has their own unique fields. (multi-table inheritance in django)
The authors of book Two scoops of Django 1.8 strongly advise against using multi-table inheritance.
Say you want to query places based on it's location and paginate the results (It doesn't have to be a location, can be any other common attribute we want to filter on)
I can see how I can achieve it using Multi-table inheritance.
select place.id from place LEFT OUTER JOIN "restaurant" on (
restuarant.id=place.id) LEFT OUTER JOIN "concerthall" on (
concerthall.id=place.id) where ... order by distance
Is it feasible to do it with abstract inheritance?

According to Django documentation: Model inheritance:
The only decision you have to make is whether you want the parent models to be models in their own right (with their own database tables), or if the parents are just holders of common information that will only be visible through the child models.
I think both possibilities are just tools, equally good tools and it just depends on your use case for their appropriateness. Surely there are specific things to consider for both approaches, and conceptually sometimes multi-table inheritance may be more difficult to comprehend, but other than that this topic just turns to become opinionated.
If you need a single queryset for both models, then it is logical that you consider multi-table inheritance rather than abstract models, because otherwise you would need to get into combining two querysets into one, most probably by using lists as this relevant answer suggests, but you would definitely lose ORM functionality.

It depends on your usecases, but Django ihave a good Database ORM for Database Normalized table structure.
Keeping the base fields in a model and keeping the specifics on another is the best approach in Database Normalization logic because you may have query on different tables and that is not a desired situation. Django relations and reverse relations offers you what you need at this point.
An Example based on yours considering you are using Multi Table Inheritance:
Place.objects.filter(location=x)
Place.objects.filter(location=x, Q(Q(concerthall__hallsize__gt=y)| Q(restaurant__menu=z)))
Place.objects.filter(location=x, concerthall__id__isnull=True)
First will return you all Restaurants and Concert Halls in x.
Second will return you All places which are Concert Halls with hall sizes greater than y or Restaurants with menu z.
Last one is a super magic query that will return you all places in location x which is not a Concert Hall. That is useful when you have many Models inheriting from Place. You can use <model_name>__id for including/excluding tables according to your needs.
You can built great JOINS including many tables and do stick to Database Normalization rules while doing this. You will keep your related data in one place and avoid possible data integrity problems.

Related

Modelling a 'web' in a Django database

I have a use case where I have a 'web' with the following relationships; what's the best way to model this in Django?
There is a logical top layer (could be modelled as a single 'very top' node if required)
There are logical leaves at the bottom
Nodes can have relationships to nodes in the layer above and/or below but never to siblings
So like that chinese game with a coin dropping through pins, there are multiple routes from top to bottom but a traversal will always work, albeit in some manner determined elsewherre (actually user input in my case).
I have tried using ManyToMany relationships but can't see how to spot the top and bottom of the relationships; do I need to switch to many OneToMany relationships for independent child and parent relationships?
I don't think you can make this kind of graph explicit in the data structure, but you can enfore it with semantic actions in the creation/update of nodes. Let's say you have a model of a Node:
class Node(models.Model):
connections = ManyToManyField(Node)
layer_level = IntegerField(null=False, default=1)
In the constructor and the magic setter __setattr__ you can then check if the connections of the node all have a layer level greater or less than the node itself and raise an exception if they're equal. I admit that it's possible that in Django "interjecting" the setting of fields of a model instance is more complicated than overriding the __setattr__ method, but that's beyond the scope of this answer.
The point is that you can set up a model that your desired structure but with less restrictions and then enforce the restrictions on creation/update, rather than making them implicit in the model structure. The latter would be a better solution, but also one that is likely not possible with relational databases and/or Django's ORM.

How to annotate a distinct Count over multiple relationships in Django?

Given a model that has more than one kind of connection to a related model (I will call the "parent" model), how could I annotate a queryset with a count of parent model objects that are linked through either connection without counting duplicates?
Example model definitions
Consider an Article model that has 2 links to a parent Publication model that are very similar in meaning.
from django.db import models
class Publication(models.Model):
pass
class Article(models.Model):
publication = models.ForeignKey(Publication, related_name='publications')
owner = models.ForeignKey(Publication, related_name='owned_articles')
Objective
I want to serve a page that is a list of publications. A business requirement is that the number of articles that the publication wishes to take credit for are shown (these publications prefer a generous metric for counting). An article is considered part of the organization if either the "owner" or "publication" field points to it, but no articles should be counted more than once for a single publication. An article may be included in the count of 2 publications if publication points to a different object than owner.
I don't want to execute a query for every publication in the list.
The problem with Count annotations here
Publication.objects.annotate(Count('publications'), Count('owned_articles')) would be trivial. Then I will have count__publications and count__owned_articles.
My problem is that I can't tell how many articles in count__publications were also counted in count__owned_articles. Django doesn't allow me to cram a full queryset into Count, so in this general case of needing extra control of what is counted a special mechanism is needed.
Similar questions
I have found this situation most similar to the question here:
Django annotate count with a distinct field
You could contrive this same general situation by intensifying that question's request by adding another related model in addition to InformationUnit and asking for a count of unique usernames among both related models.
(initial answer, answering my own question with a so-so solution)
The preferable approach would be to start with a Publication queryset, however, I can manage to squeeze a solution out of the Django ORM by pivoting around the Article queryset instead.
Consider as a solution to this problem:
exclusive_owners_qs = Article.objects.exclude(
publication=F('owner')
).annotate(Count('publication')).order_by('publication')
publications_qs = Article.objects.annotate(Count('owner')).order_by('owner')
With this, I can loop over the two querysets and add up the 2 numbers locally inside of python to get the correct counts.
This satisfies the requirements, but it's also not an elegant solution. Eliminating the need for a python loop would be ideal.
I believe the correct answer is using Count("publications", distinct=True), as described here:
https://docs.djangoproject.com/en/3.2/topics/db/aggregation/#combining-multiple-aggregations

Django model for sparse data

I am developing a django app that contains a number of forms which will be used to enter clinical data on some cancer tissue samples (10-20 fields per form, mostly CharField, FloatField and some multiple choice text dropdowns).
My challenge is that I need a form that can display different fields based on a diagnosis, for 150+ diagnoses. I can programmatically read the list of diagnoses, the fields required for each diagnosis and corresponding field types. Also, the set of all unique fields across all diagnoses is large (much larger than the number of fields needed for any specific diagnosis).
e.g.
disease_specific_fields field_type
diagnosis
B-lymphoblastic leukemia/lymphoma NOS EBV-positive Pull down: Yes/No
B-lymphoblastic leukemia/lymphoma with recurrent genetic abnormalities(TCF3-PBX1) EBV-positive Pull down: Yes/No
Monoclonal B lymphocytosis(CLL/SLL spectrum) EBV-positive Pull down: Yes/No
Peripheral T cell lymphoma NOS EBV-positive Pull down: Yes/No
AML with recurrent cytogenetic abnormalities(t(6;9) DEK-NUP214) EBV-positive Pull down: Yes/No
So far, I thought of the following approaches:
Create a single huge model that will contain mostly sparse data, and handle irrelevant data using django forms. CONS: inefficient storage and a lot of overhead code tied to forms.
Create a model for each diagnosis. CONS: complicates migrations and maintenance, I think.
Create one small model for all diagnoses that contains several 'generic' fields of each type ('CharField', 'FloatField', etc), and render respective field names dynamically in forms / views.
I am looking for any constructive suggestions on how to implement a model/models capturing the above data. Efficiency and storage are secondary concerns, mostly I want a clean and intuitive solution. Any answers tailored for django will be especially helpful.
A few options I'd consider-
Use Django-Polymorphic to create inheritance-based model types
Django-Polymorphic allows you to use inheritance for differentiating between types of models.
from polymorphic.models import PolymorphicModel
class Animal(PolymorphicModel):
kingdom = models.CharField(default="Animalia")
class Lizard(Animal):
class = models.CharField(default="Reptilia")
class Iguana(Lizard):
favorite_tree = models.Charfield()
While polymorphic uses a single db table for any model in an inheritance scheme, types are stored. As such, if you know the specific fields you want to capture hard-code it. Plus, you can filter by level (So, you could run a query on all Animal instances or all Iguana instances in the example above). There's no relations created by a polymorphic model, so performance is extremely good.
Use Django-Mutant if dynamic field creation is needed
Django-Mutant allows for dynamic creation of fields per model, allowing you top define data as needed on the fly. However, intermediary tables are required to do this. You gain a lot of flexibility while losing performance.
Use the postgres-specific JsonField to store data
Django 1.9 introduced native support for field type JsonField, allowing you to write Json structures to a db field as well as query them relatively quickly. You get amazing flexibility with decent performance but may struggle in providing user friendly forms to create, update, and verify the data. However, it has been done in many projects and there are libraries out there to assist with it.
from django.contrib.postgres.fields import JSONField
from django.db import models
class SomeModel(models.Model):
attributes = JsonField()
>>> some_attributes = {'color':'red', 'cell_count':150, 'enzymes':['xyzyss','xyxzxxyx']}
>>> a = SomeModel.objects.create(attributes=some_attributes)
>>> SomeModel.objects.filter(attributes__color='red')
(<<< will return a queryset with instance 'a' in it >>>)

Creating OneToOneField with base model

Sometimes in course of time model becomes too huge. There is a desire to split it on a several models and connect them with OneToOneField. Fields that uses most often, kept in primary model, other fields moves into other models.
However this approach becomes a headache when creating new instance of model. When you can initialize one model with one line:
MyModel.objects.create(foo=1, bar=2)
you needs at least two lines to initialize two models:
instance = MyModel.objects.create(foo=1, bar=2)
MyRelatedModel.objects.create(mymodel=instance, hello=3, world=4)
Is there a way to simply create two models in one line, or i should write my own auxiliary function for such problems?
I think, You should not split your models with onetooneField because of following reasons
As you said there will be some extra code to manage them.
Every time you query them you will have to make two queries instead of two.
Please don't forget that django models has two functions. The keep data related methods and they keep data model of your application. Some bussiness models have tables that have hundreds of fields. This is completely normal. If you really want to split them. you might want to check out abstract base classes. those are base classes for your model that does not have a seperate tables for themselves https://docs.djangoproject.com/en/dev/topics/db/models/#abstract-base-classes
But if you insist on going with oneToOne field you can wrap object creation code in one of the model's method like
MyMode.create(attr_for_model_A=1, attr_for_model_B=2)
Or you can overwrite default manager's create method to create two method instead of one
https://docs.djangoproject.com/en/dev/topics/db/managers/#modifying-initial-manager-querysets
In my opinion, non-of those will worth having small model code.

Multi-Table Inheritance in Django. I'm not sure I understand

Im not sure I understand the advantage/purpose of multi-table inheritance… but it may be what I'm looking for. Im dealing with Restaurants. My current logic is that I have a Company model which is likely (but not always) a Restaurant. Sometimes a Company can be a "parent" company, in-which case the Company model has a one-to-many with a Branch model. Both the Company and Branch models would have common fields, such as street address, contact info. If the Company has only one "branch" I can assume it's the Restaurant itself and so I don't need to attach a Branch object to the Company. Does this make sense? I know im repeating myself with with the street address [...] but it seems like an elegant way to store the data if I were to read the db directly.
Im not sure if multi-table inheritance is what I need. I just can't wrap my head around it by only looking at https://docs.djangoproject.com/en/dev/topics/db/models/#multi-table-inheritance.
edit: also open to taking any suggestions on a better db layout if im doing it wrong.
Model inheritance is useful in general because you do queries like Company.objects.all() to return all companies (including restaurants) and also Restaurant.objects.all() to return only restaurant companies. Just like 'regular' inheritance, it might be helpful to include common fields in a parent (Company) model on all the children models (Restaurant). For example, all Companies might have an address field, but only Restaurants might have a food_type field.
I've documented links to a few snippets that implement a "subclassing queryset" which basically lets you do a query like Company.objects.all() and have it return to you results like [<Company> < Restaurant>, <Company>, <Company>, < Restaurant> ]. Check out the link:
http://jazstudios.blogspot.com/2009/10/django-model-inheritance-with.html
The downside of this multi-table approach is that it introduces an extra join in your query between the Company parent table and the child Restaurant table.
An alternatieve would be to create an abstract model. This creates a separate table for Company and for Restaurant with redundant fileds. With multi-table inheritance, if we wanted to look up the address field on a Restaurant instance, we would be referencing (behind the scenes) the related Company model. With abstract inheritance, there would actually be an address field on the Restaurant table. Also, using the abstract inheritance, I don't think you can do Company.objects.all() and expect that it will return instances that were added as Restaurants, nor can you use the subclassing querysets from the snippits linked above.
Hope this helps,
Joe