Django: storing model property on a field vs. on a different model - django

I am relatively new to Django and even database design and I have some thoughts I'd like to run by some other people. This isn't really a specific question; I just want to see how other people think about this stuff.
Let's say we have a model for an application to some service. It contains all the ordinary stuff you might imagine an application to contain:
class Application(models.Model):
first_name = CharField(max_length=255)
last_name = CharField(max_length=255)
date_of_birth = DateField()
married = BooleanField()
# ...other stuff
Okay, that's all well and good. But now, imagine the webapp you are writing has the feature that you can complete your application partially, save it, and come back to it later. One way to do this is to add another attribute to the model above:
complete = BooleanField()
It works, it is pretty simple to use, but I don't really like it because it muddies the semantics of an application; it adds information that isn't intrinsically connected to the application. Another approach would be to create another model that keeps track of complete applications:
class CompleteApplication(models.Model):
application = ForeignKey(Application)
I like this a bit better, since it keeps Application clean. However, it does have the disadvantage of messing up queries. Here are the two ways to query all complete applications in the system:
Method 1:
completed_applications = Application.objects.filter(complete=True)
Method 2:
pks = CompleteApplication.objects.all().values_list("application__pk")
complete_applications = Application.object.filter(pk__in=pks)
Method 2 is two lines of code vs. one and also two queries whereas previously one sufficed, so the database performance is going to take a hit.
There is a third way to do things: instead of creating a model that keeps track of complete applications, we could create a metadata model that stores any metadata that we might want to attach to the Application model. For our purposes, this model can contain a field that tracks completeness. However, this approach also has the benefit of allowing for an arbitrary number of metadata fields to be associated with each application without requiring a new DB table for each (as is the case with Method 2 above).
class ApplicationMeta(models.Model):
application = ForeignKey(Application)
complete = BooleanField()
And, for completeness (pun intended), to query all complete applications, we would use the following statement:
completed_applications = Application.objects.all(applicationmeta__complete=True)
Nice and simple, just like Method 1, but the query is certainly more work for the database. This method also has another drawback for certain applications. Pretend, for example, that we want to track some additional information about applications: they can be confirmed, or rejected. However, if an application is not confirmed, it does NOT necessarily mean it is rejected: it could be pending review. Additionally, let's say we want to track the date of confirmation and the date of rejection (if either is applicable, of course). Then, our metadata model becomes the following:
class ApplicationMeta(models.Model):
complete = BooleanField()
confirmed = BooleanField()
rejected = BooleanField()
date_confirmed = DateField()
date_rejected = DateField()
Okay...this works, but it is starting to be a mess. Firstly, we have now opened up our system to potential error: what if somehow an ApplicationMeta instance has both rejected and confirmed set to True? We could do some fancy footwork with our class (maybe override setattr) to throw an exception if something funny happens, so we can prevent from persisting to the DB, but this is added complication that I hope is not necessary. Further, any model will either have at most one of date_confirmed or date_rejected set. Is that a problem? Here, I am not actually certain. My guess is this is likely a waste of space, but I don't actually know that. This example is simple, what if more complicated examples present us with tons of fields that will necessarily not be filled? Seems like bad design.
I'd love to hear some thoughts on these ideas.
Thanks!

If you have a huge amount of possible metadata, the third approach might make sense for performance reasons. I wouldn't do it for a few boolean- and date columns. If you're concerned about the readability of the models themselves, you can factor out any metadata into an abstract base model. You can even reuse the abstract model for other models that require the same metadata. The information will still live in your Application model.
If you do take the second or third approach, I would use a OneToOneField rather than a ForeignKey. It ensures that there are no 2 possible ApplicationMeta models for a single Application, and has the added benefit of a UNIQUE database index.
As for the status of an application, the NullBooleanField is designed for exactly that. It start as None (NULL in the db) meaning "no value". It can then be set to True (accepted) or False (rejected).

Related

Is there a Django ManyToManyField with implied ownership?

Let's imagine I'm building a Django site "CartoonWiki" which allows users to write wiki articles (represented by the WikiArticle-model) as well as posting in a forum (represented by the ForumPost-model). Over time more features will be added to the site.
A WikiArticle has a number of FileUploads which should be deleted when the WikiArticle is deleted. By "deleted" I mean Django's .delete()-method.
However, the FileUpload-model is generic -- it's not specific to WikiArticle -- and contains generic file upload logic that e.g. removes the file from S3 when it's removed from the database. Other models like ForumPost will use the FileUpload-model as well.
I don't want to use GenericForeignKey nor multi-table inheritance for the reasons Luke Plant states in the blog post Avoid Django's GenericForeignKey. (However, if you can convince me that there really is no better way than the trade-offs GenericForeignKey make, I might be swayed and accept a convincing answer of that sort.)
Now, the most trivial way to do this is to have:
class FileUpload(models.Model):
article = models.ForeignKey('WikiArticle', null=True, blank=True, on_delete=models.CASCADE)
post = models.ForeignKey('ForumPost', null=True, blank=True, on_delete=models.CASCADE)
But that will have the FileUpload-model expand indefinitely with more fields -- and similar its underlying table will gain more and more columns as new models in the system start using FileUpload. This feels suboptimal both in terms of data-modeling, but also in terms of separation-of-concerns -- the FileUpload-model and table is being changed while no actual new functionality is being added to it.
My preference would really be to go the other way around:
class WikiArticle(models.Model):
uploads = models.ManyToManyField('FileUpload')
But this doesn't solve the deletion issue: If I .delete() a WikiArticle the corresponding FileUploads won't be deleted. I've tried various setups with through-models, but none seem to solve it. What I really need is a OneToMany-field -- a sort of reverse ForeignKey to indicate the ownership in the right direction without polluting the generic/reusable model.
Should FileUpload really instead be a field? Or perhaps an abstract model? (WikiArticleFileUpload, ForumPostFileUpload, and so on...).
I realize that a true ManyToManyField with implied ownership would no longer really be a ManyToManyField since the field implies sharing. E.g. a FileUpload could technically be referenced by multiple WikiArticles, so you could be removing FileUploads from other objects rather on top of the one you're deleting. The question still stands though -- it seems I need a OneToManyField to model this in a nice way.
You probably have a couple of options to solve your problem, but it also requires on the exact requirements of your application.
Using a GenericForeignKey in this situation is probably fine, escpecially due to the fact that you do not know how many other models will use your upload model. Of course as mentioned in the linked blog post eg. doing plain SQL queries might be harder but it's on you to decide if that's a problem for your use case.
Also using inheritance might be an option, so that all the referenced models inherit the relation to the upload model from a common ancestor. This might have a small impact performance-wise because you Django would need to join the tables of the models but the impact might still be not that big. On the other hand this approach might also have some advantages if eg. your articles and posts have other stuff in common as well and you could easily do stuff like "show all new posts and articles (together)".
If you handle deletion yourself as mentioned in the previous answer you can also add ManyToMany fields yourself but also consider that this method also has some disadvantages in common with using generic foreign keys (eg. a lot of stuff to join in the database...)
Probably it's fine that you just use a GenericForeignKey, especially if the number of models that use your "generic" model gets bigger (eg. more than 3-5). All in all this sounds pretty much like a use case GenericForeignKey was made for (imagine the uploads being something like "tags" belonging to the posts).
ManyToMany fields are symmetrical, even though you define them on one model with an (explicit or implicit) related_name on the other.
I can think of two methods to clean up while, or after, WikiArticles are deleted. The first is to periodically search for and delete "orphan" FileUploads. At its simplest, (assuming a related_name of articles)
deleted = FileUpload.objects.filter( articles__isnull=True).delete()
The other is to explicitly process the related articles during deleting of the article. It's straightforward to subclass the object's delete method, but this is not the only way to delete an object (bulk_delete, for example, bypasses this). Anyway,
def delete( self, *args, **kwargs):
article_pks = self.uploads.all().values_list('pk', Flat=True)
response = super().delete( *args, **kwargs)
FileUpload.objects.filter(
pk__in = article_pks, articles__isnull=True) .delete()
return response
(or even just execute the "periodically" code above, for every article-deletion, which will also tity after any deleted though othr channels)
Please thoroughly test this if you use it. Delete operations which don't do precisely what is wanted are the scariest sorts of bug!

Override vs extend Django's models.Manager to handle deleted objects (best practice)

There is a requirement, that nothing should be deleted from database (no rows should be deleted)
So, obviously, all models should be inherited from something like this:
class BaseModel(models.Model):
is_deleted = models.BooleanField(default=False)
But, it is not obvious how to make models.Manager to handle is_deleted the best way
I can imagine two options:
1) Override BaseModel's Manager's ._get_query_set() method
So, both will return only active objects (marked as is_deleted=False):
Article.objects.all()
Article.objects.filter(id__in=[1, 2])
Even .get(...) will raise 404 if is_deleted=True:
Article.objects.get(id=1)
Also, extend with additional method, to be able to actually access is_deleted=True:
Article.objects.deleted(id=1)
2) Second option is to extend BaseModel with additional second Manager, let's say - actual
So, all three will exclude objects with is_deleted=True:
Article.actual.all()
Article.actual.filter(id__in=[1, 2])
Article.actual.get(id=1) # 404 even if in db, but is_deleted=True
At the same time, regular objects works and stands with native behaviour (ignore is_deleted or not):
Article.objects.all()
Article.objects.filter(id__in=[1, 2])
Article.objects.get(id=1)
Maybe there are another good options? Is there a best practice?
Big thx for advices!
1 or 2 options?
From the django docs:
If you use custom Manager objects, take note that the first Manager Django encounters (in the order in which they’re defined in the model) has a special status. Django interprets the first Manager defined in a class as the “default” Manager, and several parts of Django (including dumpdata) will use that Manager exclusively for that model. As a result, it’s a good idea to be careful in your choice of default manager in order to avoid a situation where overriding get_queryset() results in an inability to retrieve objects you’d like to work with.
Also any third party apps you use, will also likely use the default manager. Ask yourself if it is important for any of these apps to access any of your 'deleted' rows.
For the above reason I think I would probably opt for the two managers option.
Another consideration
When you say "nothing should be deleted from database" do you mean that no rows should be deleted, or no data should ever be removed. If the later, remember that when you update a row, that old data is lost forever, and in that sense the data is 'deleted'.
To avoid this you can have a system where you only ever add rows to your database. You would need a non-unique id field to identify which rows you use, and when you get a particular id, you chose the most recently updated row with that id. Just a thought.

Django: Two models with OneToOneField vs a single model

Let's imagine a I have a simple model Recipe:
class Recipe(models.Model):
name = models.CharField(max_length=constants.NAME_MAX_LENGTH)
preparation_time = models.DurationField()
thumbnail = models.ImageField(default=constants.RECIPE_DEFAULT_THUMBNAIL, upload_to=constants.RECIPE_CUSTOM_THUMBNAIL_LOCATION)
ingredients = models.TextField()
description = models.TextField()
I would like to create a view listing all the available recipes where only name, thumbnail, preparation_time and first 100 characters of description will be used. In addition I will have a dedicated view to render all remaining details for a single recipe.
From the efficiency point of view, since description may be a long text, would it make sense to store the extra information in a separate model, let's say 'RecipeDetails' which would not be extracted in a list view but only in a detailed view (maybe using prefetch_related method)? I am thinking about something along:
class Recipe(models.Model):
name = models.CharField(max_length=constants.NAME_MAX_LENGTH)
preparation_time = models.DurationField()
thumbnail = models.ImageField(default=constants.DEFAULT_THUMBNAIL, upload_to=constants.CUSTOM_THUMBNAIL_LOCATION)
description_preview = models.CharField(max_length=100)
class RecipeDetails(models.Model):
recipe = models.OneToOneField(Recipe, related_name="details", primary_key=True)
ingredients = models.TextField()
description = models.TextField()
In my recent online searches people seem to suggest that OneToOneField should be used only for two purposes: 1. inheritance and 2. extending existing models. In other cases two models should be merged into one. This may suggest I am missing something here. Is this a reasonable use of OneToOneField or does it only add to a complexity of an overall design?
inheritance
Don't do that, because inheritance would only be useful if you have baseclass/subclass relationship. The classic example is animal and cat/dog, in which the cats/dogs all have some basic properties that could be extracted, but your Recipe and RecipeDetail don't.
From the efficiency point of view, since description may be a long
text, would it make sense to store the extra information in a separate
model
Storing extra information in a separate model doesn't improve any efficiency. The underline database would create something like a ForeignKey field and plus unique=True to make sure the uniqueness. As far as I concerned, OneToOneField is only useful when your original model is hard to change, e.g., it is from third-party packages or some other awkward situations. Otherwise I still consider adding them to the Recipe model. In this case, you can manage your model easily while avoiding having some extra lookups like recipe.recipedetail.description, you can just do recipe.description.
No, it's not reasonable to split your Recipes. First, your model should contain all properties for being a "Recipe" (and a recipe without ingredients is not a recipe at all). Second, if you want to improve performance, then use the Django's Cache Framework (it was created exactly for improving performance issues). Third, keep it simple and do not over-engineering your development cycle. Do you really need to improve performance right now?
Hope it helps!
First mistake in development, you are thinking in efficiency before your first version is running.
Try to have now a first version, that runs, and later you can think in be more faster based in use cases with your first version. After this you can check if a model and relations, or only a new field in model or using Django Cache for views can do the work.
Your think in efficiency first will be "de-normalize" your Database btw, when one update in the model with full description is done, you need to launch one update to the model with "description-preview" field. trigger in database level? python code for update in app level? nightmares in code design ... before your code runs.

Django: using ContentType vs multi_table_inheritance

I was having a similar problem as in
How to query abstract-class-based objects in Django?
The thread suggests using multi_table_inheritance. I personally think using content_type more conceptually comfortable (just feels more close to logic, at least to me)
Using the example in the previous link, I would just add a StelarType as
class StellarType(models.Model):
"""
Use ContentType so we have a single access to all types
"""
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey('content_type', 'object_id')
Then add this to the abstract base model
class StellarObject(BaseModel):
title = models.CharField(max_length=255)
description = models.TextField()
slug = models.SlugField(blank=True, null=True)
stellartype = generic.GenericForeignKey(StellarType)
class Meta:
abstract = True
To sync between StellarObject and StellarType, we can connect post_save signal to create a StellarType instance every time a Planet or Star is created. In this way, I can query StellarObjects through StellarType.
So I'd like to know what's the PRO and CON of using this approach against using multi_table_inheritance? I think both create an additional table in the databse. But how about database performance? how about usability/flexibility? Thanks for any of your input!
To me, ContentType is the way to go when you want to relate an object to one of many models that aren't fundamentally of the same "type". Like if you want to be able to key Comments to Users, Pages, and Pictures on a social network, but there's no reasonable supertype shared by those three models. Sure you could create a "Commentable" supertype, but to me that feels more like a mixin than a fundamental type from which those three things derive. Before ContentType came out, you would have had no choice but to invent supertypes for these kind of relations, which can get really ugly really quickly if you need to do it multiple times in the same application (lets say you also have Events, Alerts, Messages, etc., each of which can apply to a different set of models).
Multi-table inheritance makes the most sense when you want to attach attributes to the base model, such that they will be shared in all concrete models that extend from it, so that you can get polymorphic behavior. Commentable doesn't really fit this mold, because all of that behavior can be put on the Comment model, less so on the Commentable objects. But if you have different classes of Users that share much of the same behavior and should be aggregable, then it makes a lot more sense.
The major pro of multi-table inheritance to me is a cleaner data model, with implicit relationships and inheritance that can be taken advantage of on the Python side (polymorphism is still a bit messy though, as seen here and here). The major pro of ContentType is that it is more general and keeps auxiliary functionality out of your models, at the cost of a bit of a slightly less pristine schema (lots of "meta" fields on your models to define these relationships). And for your example, you still have to rely on post_save, which seems unnecessarily messy/magical to me, as well.
Sorry for reviving old thread. I think it all boils down to the lookup direction. Whether you look up all subclasses for a certain FK (multitable inheritance) or define the referenced class as a content type and look it up based on the table reference and id (contenttypes) makes no big difference in performance - hint: they both suck. I think content types is a nice choice if you want your app to be easily extendible, i.e. others can add new content types to reference against. Multitable is good if you only sometimes need the extra columns defined in extra tables. Sometimes it might also be a good idea to merge all your subtypes and make only one which has a few fields left empty most of the time.

Django ORM: caching and manipulating ForeignKey objects

Consider the following skeleton of a models.py for a space conquest game:
class Fleet(models.Model):
game = models.ForeignKey(Game, related_name='planet_set')
owner = models.ForeignKey(User, related_name='planet_set', null=True, blank=True)
home = models.ForeignKey(Planet, related_name='departing_fleet_set')
dest = models.ForeignKey(Planet, related_name='arriving_fleet_set')
ships = models.IntegerField()
class Planet(models.Model):
game = models.ForeignKey(Game, related_name='planet_set')
owner = models.ForeignKey(User, related_name='planet_set', null=True, blank=True)
name = models.CharField(max_length=250)
ships = models.IntegerField()
I have many such data models for a project I'm working on, and I change the state of the game based on somewhat complicated interactions between various data objects. I want to avoid lots of unnecessary calls to the database, so once per turn, I do something like
Query all the fleets, planets, and other objects from the database and cache them as python objects
Process the game objects, resolving the state of the game
Save them back in the database
This model seems to totally break down when using ForeignKey objects. For example, when a new fleet departs a planet, I have a line that looks something like this:
fleet.home.ships -= fleet.ships
After this line runs, I have other code that alters the number of ships at each of the planets, including the planet fleet.home. Unfortunately, the changes made in the above line are not reflected in the QuerySet of planets that I obtained earlier, so that when I save all the planets at the end of the turn, the changes to fleet.home's ships get overwritten.
Is there some better way of handling this situation? Or is this just how all ORMs are?
Django's ORM does not implement an identity map (it's in the ticket tracker, but it isn't clear if or when it will be implemented; at least one core Django committer has expressed opposition to it). This means that if you arrive at the same database object through two different query paths, you are working with different Python objects in memory.
This means that your design (load everything into memory at once, modify a lot of things, then save it all back at the end) is unworkable using the Django ORM. First because it will often waste lots of memory loading in duplicate copies of the same object, and second because of "overwriting" issues like the one you're running into.
You either need to rework your design to avoid these issues (either be careful to work with only one QuerySet at a time, saving anything modified before you make another query; or if you load several queries, look up all relations manually, don't ever traverse ForeignKeys using the convenient attributes for them), or use an alternative Python ORM that implements identity map. SQLAlchemy is one option.
Note that this doesn't mean Django's ORM is "bad." It's optimized for the case of web applications, where these kinds of issues are rare (I've done web development with Django for years and never once had this problem on a real project). If your use case is different, you may want to choose a different ORM.
This is perhaps what you are looking for:
https://web.archive.org/web/20121126091406/http://simonwillison.net/2009/May/7/mmalones/