Override vs extend Django's models.Manager to handle deleted objects (best practice)

Override vs extend Django's models.Manager to handle deleted objects (best practice) - django

There is a requirement, that nothing should be deleted from database (no rows should be deleted)
So, obviously, all models should be inherited from something like this:
class BaseModel(models.Model):
is_deleted = models.BooleanField(default=False)
But, it is not obvious how to make models.Manager to handle is_deleted the best way
I can imagine two options:
1) Override BaseModel's Manager's ._get_query_set() method
So, both will return only active objects (marked as is_deleted=False):
Article.objects.all()
Article.objects.filter(id__in=[1, 2])
Even .get(...) will raise 404 if is_deleted=True:
Article.objects.get(id=1)
Also, extend with additional method, to be able to actually access is_deleted=True:
Article.objects.deleted(id=1)
2) Second option is to extend BaseModel with additional second Manager, let's say - actual
So, all three will exclude objects with is_deleted=True:
Article.actual.all()
Article.actual.filter(id__in=[1, 2])
Article.actual.get(id=1) # 404 even if in db, but is_deleted=True
At the same time, regular objects works and stands with native behaviour (ignore is_deleted or not):
Article.objects.all()
Article.objects.filter(id__in=[1, 2])
Article.objects.get(id=1)
Maybe there are another good options? Is there a best practice?
Big thx for advices!

1 or 2 options?
From the django docs:
If you use custom Manager objects, take note that the first Manager Django encounters (in the order in which they’re defined in the model) has a special status. Django interprets the first Manager defined in a class as the “default” Manager, and several parts of Django (including dumpdata) will use that Manager exclusively for that model. As a result, it’s a good idea to be careful in your choice of default manager in order to avoid a situation where overriding get_queryset() results in an inability to retrieve objects you’d like to work with.
Also any third party apps you use, will also likely use the default manager. Ask yourself if it is important for any of these apps to access any of your 'deleted' rows.
For the above reason I think I would probably opt for the two managers option.
Another consideration
When you say "nothing should be deleted from database" do you mean that no rows should be deleted, or no data should ever be removed. If the later, remember that when you update a row, that old data is lost forever, and in that sense the data is 'deleted'.
To avoid this you can have a system where you only ever add rows to your database. You would need a non-unique id field to identify which rows you use, and when you get a particular id, you chose the most recently updated row with that id. Just a thought.

Related

Is there a Django ManyToManyField with implied ownership?

Let's imagine I'm building a Django site "CartoonWiki" which allows users to write wiki articles (represented by the WikiArticle-model) as well as posting in a forum (represented by the ForumPost-model). Over time more features will be added to the site.
A WikiArticle has a number of FileUploads which should be deleted when the WikiArticle is deleted. By "deleted" I mean Django's .delete()-method.
However, the FileUpload-model is generic -- it's not specific to WikiArticle -- and contains generic file upload logic that e.g. removes the file from S3 when it's removed from the database. Other models like ForumPost will use the FileUpload-model as well.
I don't want to use GenericForeignKey nor multi-table inheritance for the reasons Luke Plant states in the blog post Avoid Django's GenericForeignKey. (However, if you can convince me that there really is no better way than the trade-offs GenericForeignKey make, I might be swayed and accept a convincing answer of that sort.)
Now, the most trivial way to do this is to have:
class FileUpload(models.Model):
article = models.ForeignKey('WikiArticle', null=True, blank=True, on_delete=models.CASCADE)
post = models.ForeignKey('ForumPost', null=True, blank=True, on_delete=models.CASCADE)
But that will have the FileUpload-model expand indefinitely with more fields -- and similar its underlying table will gain more and more columns as new models in the system start using FileUpload. This feels suboptimal both in terms of data-modeling, but also in terms of separation-of-concerns -- the FileUpload-model and table is being changed while no actual new functionality is being added to it.
My preference would really be to go the other way around:
class WikiArticle(models.Model):
uploads = models.ManyToManyField('FileUpload')
But this doesn't solve the deletion issue: If I .delete() a WikiArticle the corresponding FileUploads won't be deleted. I've tried various setups with through-models, but none seem to solve it. What I really need is a OneToMany-field -- a sort of reverse ForeignKey to indicate the ownership in the right direction without polluting the generic/reusable model.
Should FileUpload really instead be a field? Or perhaps an abstract model? (WikiArticleFileUpload, ForumPostFileUpload, and so on...).
I realize that a true ManyToManyField with implied ownership would no longer really be a ManyToManyField since the field implies sharing. E.g. a FileUpload could technically be referenced by multiple WikiArticles, so you could be removing FileUploads from other objects rather on top of the one you're deleting. The question still stands though -- it seems I need a OneToManyField to model this in a nice way.

You probably have a couple of options to solve your problem, but it also requires on the exact requirements of your application.
Using a GenericForeignKey in this situation is probably fine, escpecially due to the fact that you do not know how many other models will use your upload model. Of course as mentioned in the linked blog post eg. doing plain SQL queries might be harder but it's on you to decide if that's a problem for your use case.
Also using inheritance might be an option, so that all the referenced models inherit the relation to the upload model from a common ancestor. This might have a small impact performance-wise because you Django would need to join the tables of the models but the impact might still be not that big. On the other hand this approach might also have some advantages if eg. your articles and posts have other stuff in common as well and you could easily do stuff like "show all new posts and articles (together)".
If you handle deletion yourself as mentioned in the previous answer you can also add ManyToMany fields yourself but also consider that this method also has some disadvantages in common with using generic foreign keys (eg. a lot of stuff to join in the database...)
Probably it's fine that you just use a GenericForeignKey, especially if the number of models that use your "generic" model gets bigger (eg. more than 3-5). All in all this sounds pretty much like a use case GenericForeignKey was made for (imagine the uploads being something like "tags" belonging to the posts).

ManyToMany fields are symmetrical, even though you define them on one model with an (explicit or implicit) related_name on the other.
I can think of two methods to clean up while, or after, WikiArticles are deleted. The first is to periodically search for and delete "orphan" FileUploads. At its simplest, (assuming a related_name of articles)
deleted = FileUpload.objects.filter( articles__isnull=True).delete()
The other is to explicitly process the related articles during deleting of the article. It's straightforward to subclass the object's delete method, but this is not the only way to delete an object (bulk_delete, for example, bypasses this). Anyway,
def delete( self, *args, **kwargs):
article_pks = self.uploads.all().values_list('pk', Flat=True)
response = super().delete( *args, **kwargs)
FileUpload.objects.filter(
pk__in = article_pks, articles__isnull=True) .delete()
return response
(or even just execute the "periodically" code above, for every article-deletion, which will also tity after any deleted though othr channels)
Please thoroughly test this if you use it. Delete operations which don't do precisely what is wanted are the scariest sorts of bug!

Django: storing model property on a field vs. on a different model

I am relatively new to Django and even database design and I have some thoughts I'd like to run by some other people. This isn't really a specific question; I just want to see how other people think about this stuff.
Let's say we have a model for an application to some service. It contains all the ordinary stuff you might imagine an application to contain:
class Application(models.Model):
first_name = CharField(max_length=255)
last_name = CharField(max_length=255)
date_of_birth = DateField()
married = BooleanField()
# ...other stuff
Okay, that's all well and good. But now, imagine the webapp you are writing has the feature that you can complete your application partially, save it, and come back to it later. One way to do this is to add another attribute to the model above:
complete = BooleanField()
It works, it is pretty simple to use, but I don't really like it because it muddies the semantics of an application; it adds information that isn't intrinsically connected to the application. Another approach would be to create another model that keeps track of complete applications:
class CompleteApplication(models.Model):
application = ForeignKey(Application)
I like this a bit better, since it keeps Application clean. However, it does have the disadvantage of messing up queries. Here are the two ways to query all complete applications in the system:
Method 1:
completed_applications = Application.objects.filter(complete=True)
Method 2:
pks = CompleteApplication.objects.all().values_list("application__pk")
complete_applications = Application.object.filter(pk__in=pks)
Method 2 is two lines of code vs. one and also two queries whereas previously one sufficed, so the database performance is going to take a hit.
There is a third way to do things: instead of creating a model that keeps track of complete applications, we could create a metadata model that stores any metadata that we might want to attach to the Application model. For our purposes, this model can contain a field that tracks completeness. However, this approach also has the benefit of allowing for an arbitrary number of metadata fields to be associated with each application without requiring a new DB table for each (as is the case with Method 2 above).
class ApplicationMeta(models.Model):
application = ForeignKey(Application)
complete = BooleanField()
And, for completeness (pun intended), to query all complete applications, we would use the following statement:
completed_applications = Application.objects.all(applicationmeta__complete=True)
Nice and simple, just like Method 1, but the query is certainly more work for the database. This method also has another drawback for certain applications. Pretend, for example, that we want to track some additional information about applications: they can be confirmed, or rejected. However, if an application is not confirmed, it does NOT necessarily mean it is rejected: it could be pending review. Additionally, let's say we want to track the date of confirmation and the date of rejection (if either is applicable, of course). Then, our metadata model becomes the following:
class ApplicationMeta(models.Model):
complete = BooleanField()
confirmed = BooleanField()
rejected = BooleanField()
date_confirmed = DateField()
date_rejected = DateField()
Okay...this works, but it is starting to be a mess. Firstly, we have now opened up our system to potential error: what if somehow an ApplicationMeta instance has both rejected and confirmed set to True? We could do some fancy footwork with our class (maybe override setattr) to throw an exception if something funny happens, so we can prevent from persisting to the DB, but this is added complication that I hope is not necessary. Further, any model will either have at most one of date_confirmed or date_rejected set. Is that a problem? Here, I am not actually certain. My guess is this is likely a waste of space, but I don't actually know that. This example is simple, what if more complicated examples present us with tons of fields that will necessarily not be filled? Seems like bad design.
I'd love to hear some thoughts on these ideas.
Thanks!

If you have a huge amount of possible metadata, the third approach might make sense for performance reasons. I wouldn't do it for a few boolean- and date columns. If you're concerned about the readability of the models themselves, you can factor out any metadata into an abstract base model. You can even reuse the abstract model for other models that require the same metadata. The information will still live in your Application model.
If you do take the second or third approach, I would use a OneToOneField rather than a ForeignKey. It ensures that there are no 2 possible ApplicationMeta models for a single Application, and has the added benefit of a UNIQUE database index.
As for the status of an application, the NullBooleanField is designed for exactly that. It start as None (NULL in the db) meaning "no value". It can then be set to True (accepted) or False (rejected).

What do I use instead of ForiegnKey relationship for multi-db relation

I need to provide my users a list of choices from a model which is stored in a separate legacy database. Foreign keys aren't supported in django multi-db setups. The ideal choice would be to use a foreign key, but since thats not possible I need to some up with something else.
So instead I created an IntegerField on the other database and tried using choices to get a list of available options.
class OtherDBTable(models.Model):
client = models.IntegerField(max_length=20, choices=Client.objects.values_list('id','name'))
the problem I'm having is that the choices seem to get populated once but never refreshed. How do I ensure that whenever the client list changes that those newest options area available to pick.

What I was really looking for was a way that I could simulate the behavior of a Foreign key field, at least as far as matching up ID's go.
There wasn't a clear way to do this, since it doesn't seem like you can actually specify an additional field when you instantiate a model (you can with forms, easily)
In any case to deal with the problem, since the database is MySQL based, I ended up creating views from the tables I needed in the other database.

To build on #Yuji's answer - if you do self.fields['field'].choices = whatever in the model's __init__, whatever can be any iterable.
This means you can inherit from iterable, and have that object interface to your legacy database, or you can use a generator function (in case you are unfamiliar, look up the yield keyword).

Citing a Django's manual:
Finally, note that choices can be any iterable object -- not necessarily a list or tuple. This lets you construct choices dynamically. But if you find yourself hacking choices to be dynamic, you're probably better off using a proper database table with a ForeignKey. choices is meant for static data that doesn't change much, if ever.
Why dont you want just export data from the legacy database and to import it into the new one? This could be done periodically, if the legacy database still in use.

Smarter removing objects with many-to-many relationship in Django admin interface

I'd like to remove some object with many-to-many relationship using Django admin interface. Standard removing also removes all related objects and the list of removed objects displayed on confirmation page. But I don't need to remove related objects!
Assume we have ContentTopic and ContentItem:
class ContentTopic(models.Model):
name = models.CharField()
code = models.CharField()
class ContentItem(models.Model):
topic = models.ManyToManyField(ContentTopic, db_index=True,\
blank=True, related_name='content_item')
So, I'd like to remove ContentTopic instance using Django admin, but I don't need remove all related ContentItems. So, confirmation page should display only ContentTopic instance to remove.
What is the best way to handle this?

This happens so, coz its developed to do so.
If you want to change this behaviour, the one way can be over-riding delete method of django.db.models.Model.
This delete() method actually does two things, first gathering a list of all dependent objects and delete them. So here, you can override it, to get that list of dependent objects, iterating over it and set their reference to None, instead of deleting them. And thus deleting the concerned object cleanly.
May be if you want this behavior throughout, you can extend a class from django.db.models.Models, override delete(), and extend all your models from this new class.

How can I easily mark records as deleted in Django models instead of actually deleting them?

Instead of deleting records in my Django application, I want to just mark them as "deleted" and have them hidden from my active queries. My main reason to do this is to give the user an undelete option in case they accidentally delete a record (these records may also be needed for certain backend audit tracking.)
There are a lot of foreign key relationships, so when I mark a record as deleted I'd have to "Cascade" this delete flag to those records as well. What tools, existing projects, or methods should I use to do this?

Warning: this is an old answer and it seems that the documentation is recommending not to do that now: https://docs.djangoproject.com/en/dev/topics/db/managers/#don-t-filter-away-any-results-in-this-type-of-manager-subclass
Django offers out of the box the exact mechanism you are looking for.
You can change the manager that is used for access through related objects. If you new custom manager filters the object on a boolean field, the object flagged inactive won't show up in your requests.
See here for more details :
http://docs.djangoproject.com/en/dev/topics/db/managers/#using-managers-for-related-object-access

Nice question, I've been wondering how to efficiently do this myself.
I am not sure if this will do the trick, but django-reversion seems to do what you want, although you probably want to examine to see how it achieves this goal, as there are some inefficient ways to do it.
Another thought would be to have the dreaded boolean flag on your Models and then creating a custom manager that automatically adds the filter in, although this wouldn't work for searches across different Models. Yet another solution suggested here is to have duplicate models of everything, which seems like overkill, but may work for you. The comments there also discuss different options.
I will add that for the most part I don't consider any of these solutions worth the hassle; I usually just suck it up and filter my searches on the boolean flag. It avoids many issues that can come up if you try to get too clever. It is a pain and not very DRY, of course. A reasonable solution would be a mixture of the Custom manager while being aware of its limitations if you try searching a related model through it.

I think using a boolean 'is_active' flag is fine - you don't need to cascade the flag to related entries at the db level, you just need to keep referring to the status of the parent. This is what happens with contrib.auth's User model, remember - marking a user as not is_active doesn't prompt django to go through related models and magically try to deactivate records, rather you just keep checking the is_active attribute of the user corresponding to the related item.
For instance if each user has many bookmarks, and you don't want an inactive user's bookmarks to be visible, just ensure that bookmark.user.is_active is true. There's unlikely to be a need for an is_active flag on the bookmark itself.

Here's a quick blog tutorial from Greg Allard from a couple of years ago, but I implemented it using Django 1.3 and it was great. I added methods to my objects named soft_delete, undelete, and hard_delete, which set self.deleted=True, self.deleted=False, and returned self.delete(), respectively.
A Django Model Manager for Soft Deleting Records and How to Customize the Django Admin

There are several packages which provide this functionality: https://www.djangopackages.com/grids/g/deletion/
I'm developing one https://github.com/meteozond/django-permanent/
It replaces default Manager and QuerySet delete methods to bring in logical deletion.
It completely shadows default Django delete methods with one exception - marks models which are inherited from PermanentModel instead of deletion, even if their deletion caused by relation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js