What belongs in django model clean method - django

I'm wondering what the appropriate things to put in my model's clean() method are.
Does it make sense to put all the verification of and manipulation to a model's properties to ensure it is valid (ie. business logic)? There is a lot of that in my case and I'm wondering if it makes sense to execute it all every time a model is saved.
For example i'm doing things like :
- if a video is marked as private, remove all its references in playlsts
- ensure that the video's title is unique with relation to the users other videos
- etc.
some of the things i'm doing only really need to be done on creation of a new video - so checking/ setting them every time the model is saved also seems excessive.
Is this the correct use of the clean() method?

Clearing relationships is probably best handled by a signal. To validate your signals are working properly, you can write a unit test.
Validating that the title is unique is something that definitely belongs in a form/model validator. To me, that seems like a better separation of concerns.

Related

Good Django design practice to add a REST api later following DRY

I am starting a web application in pure Django. However, in the future, there might be a requirement for REST api. If it happens, the most obvious choice will be Django REST framework.
Both the "old-fashioned" and REST parts share the models, however, the views are slightly different (permissions definitions, for example) and forms are replaced with serializers. Doing it the most obvious way would mean to duplicate the application logic several times, and thus a failure to follow DRY principle, and so the code becomes unmaintainable.
I got an idea to write all the logic into models (since they are shared), but in such case, there will be no use of permission mixins, generic views and the code would not be among the nicest ones.
Now I ran out of ideas. What is the best practice here?
I'd try to keep things simple as you're not sure about the future requirements for the API, and guessing can introduce extra complexity that may not even be needed when requirements will be clear.
Both Django forms and Rest Framework serializers already offer you a declarative approach that abstracts away the boilerplate code needed for basic stuff, which normally accounts for most of your code anyway.
For example, one of your Django form could look like this:
class ArticleForm(ModelForm):
class Meta:
model = Article
fields = ['title', 'content']
And in the future the DRS serializer would be:
class ArticleSerializer(ModelSerializer):
class Meta:
model = Article
fields = ['title', 'content']
As you can see, if you try and stick to ModelForm and ModelSerializer, there won't be much duplication anyway. You can also simply store the fields list in a variable and just reuse that.
For more custom things, you can start by sharing logic into simple functions, for example:
def save_article_with_author(article_data, author_data):
# custom data manipulation before saving, consider that article_data will be a dictionary either if it comes from deserialized JSON (api) or POST data
# send email, whatever
This function can be shared between your form and serializer.
For everything related to data fetching, I'd try to use Model Managers as much as possible, defining custom querysets that can be resued e.g. for options by forms and serializers.
I tend to avoid writing any logic that doesn't directly read or write data into the model classes. I think that couples too much the business logic with the data layer. As an example, I never want to write any auth/permission checks into a save() method of a model, because that couples different layers too tightly.
As a rule of thumb, imagine this scenario: you add say permissions checks or the logic to send an email when a user is created overriding the save() method of your Article model.
Then, later on you're asked to write a simple manage command that batch-import users from a spreadsheet. At this point, what you did in your save() method really gets in the way, as you can freely access your data through your model without having to bother with permissions, emails and all of that.
Regarding the view layer and assuming you need to implement some shared auth/permission checks and you don't want to have separate views, you can use this approach:
https://www.django-rest-framework.org/topics/html-and-forms/
Blockquote
REST framework is suitable for returning both API style responses, and regular HTML pages. Additionally, serializers can be used as HTML forms and rendered in templates.
Here's some guidelines on how you could dynamically switch from HTML to JSON based to the request content type:
https://www.django-rest-framework.org/api-guide/renderers/#advanced-renderer-usage
This seems like a good option in your situation, I'd just write down a quick proof-of-concept before you go all in to see if you are not too limited for what you need to do.

Django: storing model property on a field vs. on a different model

I am relatively new to Django and even database design and I have some thoughts I'd like to run by some other people. This isn't really a specific question; I just want to see how other people think about this stuff.
Let's say we have a model for an application to some service. It contains all the ordinary stuff you might imagine an application to contain:
class Application(models.Model):
first_name = CharField(max_length=255)
last_name = CharField(max_length=255)
date_of_birth = DateField()
married = BooleanField()
# ...other stuff
Okay, that's all well and good. But now, imagine the webapp you are writing has the feature that you can complete your application partially, save it, and come back to it later. One way to do this is to add another attribute to the model above:
complete = BooleanField()
It works, it is pretty simple to use, but I don't really like it because it muddies the semantics of an application; it adds information that isn't intrinsically connected to the application. Another approach would be to create another model that keeps track of complete applications:
class CompleteApplication(models.Model):
application = ForeignKey(Application)
I like this a bit better, since it keeps Application clean. However, it does have the disadvantage of messing up queries. Here are the two ways to query all complete applications in the system:
Method 1:
completed_applications = Application.objects.filter(complete=True)
Method 2:
pks = CompleteApplication.objects.all().values_list("application__pk")
complete_applications = Application.object.filter(pk__in=pks)
Method 2 is two lines of code vs. one and also two queries whereas previously one sufficed, so the database performance is going to take a hit.
There is a third way to do things: instead of creating a model that keeps track of complete applications, we could create a metadata model that stores any metadata that we might want to attach to the Application model. For our purposes, this model can contain a field that tracks completeness. However, this approach also has the benefit of allowing for an arbitrary number of metadata fields to be associated with each application without requiring a new DB table for each (as is the case with Method 2 above).
class ApplicationMeta(models.Model):
application = ForeignKey(Application)
complete = BooleanField()
And, for completeness (pun intended), to query all complete applications, we would use the following statement:
completed_applications = Application.objects.all(applicationmeta__complete=True)
Nice and simple, just like Method 1, but the query is certainly more work for the database. This method also has another drawback for certain applications. Pretend, for example, that we want to track some additional information about applications: they can be confirmed, or rejected. However, if an application is not confirmed, it does NOT necessarily mean it is rejected: it could be pending review. Additionally, let's say we want to track the date of confirmation and the date of rejection (if either is applicable, of course). Then, our metadata model becomes the following:
class ApplicationMeta(models.Model):
complete = BooleanField()
confirmed = BooleanField()
rejected = BooleanField()
date_confirmed = DateField()
date_rejected = DateField()
Okay...this works, but it is starting to be a mess. Firstly, we have now opened up our system to potential error: what if somehow an ApplicationMeta instance has both rejected and confirmed set to True? We could do some fancy footwork with our class (maybe override setattr) to throw an exception if something funny happens, so we can prevent from persisting to the DB, but this is added complication that I hope is not necessary. Further, any model will either have at most one of date_confirmed or date_rejected set. Is that a problem? Here, I am not actually certain. My guess is this is likely a waste of space, but I don't actually know that. This example is simple, what if more complicated examples present us with tons of fields that will necessarily not be filled? Seems like bad design.
I'd love to hear some thoughts on these ideas.
Thanks!
If you have a huge amount of possible metadata, the third approach might make sense for performance reasons. I wouldn't do it for a few boolean- and date columns. If you're concerned about the readability of the models themselves, you can factor out any metadata into an abstract base model. You can even reuse the abstract model for other models that require the same metadata. The information will still live in your Application model.
If you do take the second or third approach, I would use a OneToOneField rather than a ForeignKey. It ensures that there are no 2 possible ApplicationMeta models for a single Application, and has the added benefit of a UNIQUE database index.
As for the status of an application, the NullBooleanField is designed for exactly that. It start as None (NULL in the db) meaning "no value". It can then be set to True (accepted) or False (rejected).

How to save a related, inline django model when parent is saved in the admin?

In my models, I have an Event class, a Volunteer class, and a Session class. The Session class has a foreign key field for an Event and a Volunteer, and is a unique coupling of both, as well as a date and time. Taken together, Volunteer and Event I think technically have a ManyToMany relationship.
Using the pre-packaged Django admin, I edit Volunteers and Events with their own admin.ModelAdmin classes respectively. Sessions are edited inline in the Events ModelAdmin.
When I add a new Session to an event in the admin interface, with a Volunteer, I need the Volunteer's hours field to be automatically updated, to reflect however many hours the newly added session lasted (plus all past sessions). Currently, I just have a calculate_hours function in the Volunteer model, which iterates over all sessions each time it is called and finds the sum of the hours. I tried to call it with a custom save function in Session, but it appears never to be called after the Event save function. I would try it in Event, but I have no way to isolate which Volunteers need their hours recalculated. The hours field IS updated if I manually go over to the Volunteer admin page, edit, and then save the Volunteer, but this is pretty unacceptable.
I see that there are many questions on SO about Django problems when saving inline objects on the admin site, particularly with ManyToMany fields. I'm not sure, after reading many of these questions, if what they say applies in my case--maybe I need to receive a signal somewhere, or include a custom save in a special place, or call save_model in my admin.ModelAdmin class... I just don't know. What is the best way to go about this?
Code can be found here: Models.py, Admin.py
First of all, the relationship you're describing is what called a ManyToMany "through" (you can read about it in the documentation here).
Secondly, I don't understand why you need the 'hours' to be a field at all. Isn't a function enough for this? why save it in the database in the first place? you can just call it every time you need it.
Finally, it seems to me you're doing a lot of extra work that I don't understand - why do you need the volunteer time boolean field? If you link a volunteer with an event isn't that enough to know that he was there? And what's the purpose of "counts_towards_volunteer_time"? I'm probably missing some of the logic here, but a lot of that seems wasteful.

Application logic in view code

Should I be writing application logic in my view code? For example, on submission of a form element, I need to create a user and send him an activation email. Is this something to do from a view function, or should I create a separate function to make it easier to test down the road? What does Django recommend here?
I found it very hard to figure out where everything goes when I started using django. It really depends on the type of logic you are writing.
First start with the models: model-methods and managers are a good place to perform row-level logic and table level logic i.e. a model manager would be a good place to write code to get a list of categories that are associated with all blogposts. A model method is a good place to count the characters in a particular blogpost.
View level logic should deal with bringing it all together - taking the request, performing the necessary steps to get to the result you need (maybe using the model managers) and then getting it ready for the template.
If there is code that doesn't fit in else where, but has a logical structure you can simply write a module to perform that. Similarly if there are scraps of code that you don't think belong, keep a utils.py to hold them.
You shouldn't perform any logic really in your templates - instead use template tags if you have to. These are good for using reusable pieces of code that you you neither want in every request cycle nor one single request cycle - you might want them in a subset (i.e. displaying a list of categories while in the blog section of your website)
If you do want some logic to be performed in every request cycle, use either context processors or middleware. If you want some logic to be performed only in one single request cycle, the view is probably the place.
TLDR: Writing logic in your view is fine, but there are plenty of places that might be more appropriate
Separating the registration code into it's own function to make testing easier is a good reason. If you allowed admins to register users in a separate, private view, then a registration function would be more DRY. Personally, I don't think a little application logic in the code will do to much harm.
You might find it instructive to have a look at the registration view in the django-registration app -- just to see how it's written, I'm not saying you should or have to use it. It has encapsulated the user registration into it's own function (there's a level of indirection as well, because the registration backends are pluggable).

Using QuerySet.update() versus ModelInstance.save() in Django

I am curious what others think about this problem...
I have been going back and forth in the past few days about using QuerySet.update() versus ModelInstance.save(). Obviously if there are lots of fields being changed, I'd use save(), but for updating a couple of fields, I think it's better to use QuerySet.update(). The benefit of using QuerySet.update() is that you can have multiple threads running update() at the same time, on different fields of the same object, and you won't have race issues. The default save() method saves all the fields, so parallel save() from two threads will be problematic.
So then the issue is what if you have overloaded, custom save() methods. The best I can think of is to abstract whatever in the custom save() method into separate updater methods that actually uses QuerySet.update() to set a couple of fields in the model. Has anyone used this pattern?
What's a bit irritating is that in Django Admin, even in editing in change list mode where you are editing just one field, the entire model is saved. This basically means if someone have a change list open on his/her browser, while some where else in the system a field gets updated, that updated value will be thrown away when this user saves changes from the change list. Is there a solution to this problem?
Thoughts?
Thanks.
The main reason for using QuerySet.update() is that you can update more than one object with just one database query, while every call to an object's save method will hit the database!
Another point worth mentioning is, that django's pre_save & post_save signals are only sent when you call an object's save-method, but not upon QuerySet.update().
Coming to the conflict issues you describe, I think it would also be irritating if you hit 'save' and then you have to discover that afterwards some values are the same as when you changed them, but some that you left unchanged have changed?!? Of course it's up to you to modify the admin's save_model or the object's save method to the bahaviour you suggest.
The problem you described about Django Admin is essentially about updating a model instance using an outdated copy. It is easy to fix by adding a version number to each model instance and increment the number on each update. Then in the save method of the model, just make sure what you are saving is not behind what is already in the database.
I want to make sure when there are parallel writes to the same object, each write updates a different fields, they don't overwrite each other's values.
Depending on the application, this may or may not be a sensible thing. Saving a whole model even if only a single field is updated can often avoid breaking integrity of data. Thinking about the following example about travel itinerary of three-leg flight. Assume there is an instance of three fields representing three legs and three fields are SF->LA, LA->DC, DC->NY. Now if one update is to update the first two legs to SF->SD, SD->DC, and another update is to update the last two legs to LA->SJ, SJ->NY, and if you allow both to happen with update instead of saving the full model instance, you would come out with a broken itinerary of SF->SD, LA->SJ, SJ->NY.