Validation of model with incomplete data in Django

Validation of model with incomplete data in Django - django

I have a following use case in a project I'm working on.
Workflow looks like this:
the system accepts proposed candidates (at this stage most fields are not required, email address can be invalid, etc.)
their information can be corrected and updated
some of the candidates get registered (now fields such as name and last name are required, email has to be valid, etc.)
I came up with two ideas.
One is to have two models for candidates. Then I could leverage automatic validation from Models and ModelForms but it would require copying instances from one model to the other while registering candidates and would bring problems in different places (for example with ForeignKeys).
Second idea is to have one model that accepts incomplete data and two ModelForms, one with redefined fields.
Both ideas require duplication of quite similar code.
Does anyone know about DRY and Django-style way of approaching this problem?

You can define a model without any database restrictions and then implement two different (Model)Forms.
Form A is used for the input of new objects. Thus, your Form A should not contain specific validation logic.
Form B on the other hand can hold all your validation logic and can be used to maintain data integrity.
Please note, that this approach will not guarantee database integrity. Your validation logic should be exposed to heavy unittesting.

It sounds like you need a model for the proposed candidate info and another model for the final registered candidate, if you want to save the intermediate stage on the database and allow a user to edit it on another visit. Separate models for separate concerns. To reiterate your first option :)
Editing this as I realise it's all just rehashing your question. In this situation I'd stick with option 1, creating a new instance with the final candidate info when necessary. I'd probably link the final candidate to the proposed candidate instance with a foreign key and have null=True in the field definition, for the initial case. This would allow a cleanup task to run periodically and remove the proposed candidate instances where the data is duplicated for registered candidates. Or you could bite the bullet and remove the proposed instance upon a successful save of the final instance. Proceed with caution; you probably don't need to be thinking about removing stale data unless it becomes a problem, so I digress.
There's some slight duplication of code here, but not a great deal, and sometimes it's unavoidable if you want to reduce the potential for hairy application logic surprises.

Related

Is there a model MultiField (any way to compose db models Fields in Django)? Or why would not that be a useful concept?

When building a Django application, we were exposed to (forms) MultiValueField and MultiWidget.
They seem like an interesting approach to compose their respective base classes, giving more modularity.
Yet, now it seems to us that the actual piece that would make those two shine bright would be a db.models.MultiField. Here is the reasoning:
It seems that, when using a ModelForm, Django is enforcing a strict 1-to-1 association between a models.Field and a forms.Field. Now, with forms.MultiValueField, despite this strict 1-to-1 association, you can have a single models.Field actually associated to the numerous forms.Field composing the forms.MultiValueField.
Yet, it is limited to the case where a single models.Field maps more naturally to several forms.Field. What seems very interesting would be the ability to associate any number of models.Fields to any number of forms.Field. The only piece that seems to be missing to get there is an hypothetical models.MultiField. It could communicate with the exterior through a compress() method (see. MultiValueField), and potentially a decompress() method in the other direction (see MultiWidget).
The questions would then be (assuming this requirement is not emerging from a misunderstanding of Django): Is there a way to compose modelds.Field in Django ? If not, why is such an empowering concept not implemented in this great framework ;) ?
EDIT: To give a motivating example, imagine we want to implement a partial date (a date that can be precise to a day, or just precise to a month and a year, or alternatively just a year), with a Model following the one presented in this answer, i.e.:
a DateField, representing the date
a CharField, to indicate whether the date is complete or month + year or just year.
This model is working just fine with the default ModelForm, but now we want to introduce some consistency check: if a date is only precise to the month (month + year or just year), its day part should be 1, and if it is only precise to the year, its month part should also be 1.
This is a cross-fields check (two different Fields from the ModelForm needs to be accessed to complete it), so it has to be implemented at the Form.clean() level. This check would need to be copy-pasted in each form containing a partial date, which goes against the DRY cherished by Django.
Now let's imagine Django is providing this hypothetical models.MultiField, which would be a composite models.Field. We could define a PartialDateclass deriving from MultiField and containing the two leaf fields defined above (DateField and CharField). We could now say that the form field corresponding to this single model field (in a ModelForm) is a class derived from forms.MultiValueField. This class could implement the consistency check above, at the field level: it is not a cross-field check anymore.
This way, we got rid of the code duplication: any model could use a PartialDate field, automatically making any ModelForms mapping to it use the forms.MultiValueField implementing the consistency check, whose code was only written in one place.
(This is a simple example, it is easy to imagine it can get way more complex in production code, with consistency check you do not want to copy paste)

Possible: Yes. You could either subclass djangos model class, or monkey-patch that class into the existing model module.
Just (educated) guessing: I think it is not missing, but not needed.
In DB-Applications, combined fields will almost always come with special business rules. So you will have to implement a different display and validation for each one of them anyway.
Which you already can easily do in the models.Form.
Maybe you should look at customization of models.Form?

Django: storing model property on a field vs. on a different model

I am relatively new to Django and even database design and I have some thoughts I'd like to run by some other people. This isn't really a specific question; I just want to see how other people think about this stuff.
Let's say we have a model for an application to some service. It contains all the ordinary stuff you might imagine an application to contain:
class Application(models.Model):
first_name = CharField(max_length=255)
last_name = CharField(max_length=255)
date_of_birth = DateField()
married = BooleanField()
# ...other stuff
Okay, that's all well and good. But now, imagine the webapp you are writing has the feature that you can complete your application partially, save it, and come back to it later. One way to do this is to add another attribute to the model above:
complete = BooleanField()
It works, it is pretty simple to use, but I don't really like it because it muddies the semantics of an application; it adds information that isn't intrinsically connected to the application. Another approach would be to create another model that keeps track of complete applications:
class CompleteApplication(models.Model):
application = ForeignKey(Application)
I like this a bit better, since it keeps Application clean. However, it does have the disadvantage of messing up queries. Here are the two ways to query all complete applications in the system:
Method 1:
completed_applications = Application.objects.filter(complete=True)
Method 2:
pks = CompleteApplication.objects.all().values_list("application__pk")
complete_applications = Application.object.filter(pk__in=pks)
Method 2 is two lines of code vs. one and also two queries whereas previously one sufficed, so the database performance is going to take a hit.
There is a third way to do things: instead of creating a model that keeps track of complete applications, we could create a metadata model that stores any metadata that we might want to attach to the Application model. For our purposes, this model can contain a field that tracks completeness. However, this approach also has the benefit of allowing for an arbitrary number of metadata fields to be associated with each application without requiring a new DB table for each (as is the case with Method 2 above).
class ApplicationMeta(models.Model):
application = ForeignKey(Application)
complete = BooleanField()
And, for completeness (pun intended), to query all complete applications, we would use the following statement:
completed_applications = Application.objects.all(applicationmeta__complete=True)
Nice and simple, just like Method 1, but the query is certainly more work for the database. This method also has another drawback for certain applications. Pretend, for example, that we want to track some additional information about applications: they can be confirmed, or rejected. However, if an application is not confirmed, it does NOT necessarily mean it is rejected: it could be pending review. Additionally, let's say we want to track the date of confirmation and the date of rejection (if either is applicable, of course). Then, our metadata model becomes the following:
class ApplicationMeta(models.Model):
complete = BooleanField()
confirmed = BooleanField()
rejected = BooleanField()
date_confirmed = DateField()
date_rejected = DateField()
Okay...this works, but it is starting to be a mess. Firstly, we have now opened up our system to potential error: what if somehow an ApplicationMeta instance has both rejected and confirmed set to True? We could do some fancy footwork with our class (maybe override setattr) to throw an exception if something funny happens, so we can prevent from persisting to the DB, but this is added complication that I hope is not necessary. Further, any model will either have at most one of date_confirmed or date_rejected set. Is that a problem? Here, I am not actually certain. My guess is this is likely a waste of space, but I don't actually know that. This example is simple, what if more complicated examples present us with tons of fields that will necessarily not be filled? Seems like bad design.
I'd love to hear some thoughts on these ideas.
Thanks!

If you have a huge amount of possible metadata, the third approach might make sense for performance reasons. I wouldn't do it for a few boolean- and date columns. If you're concerned about the readability of the models themselves, you can factor out any metadata into an abstract base model. You can even reuse the abstract model for other models that require the same metadata. The information will still live in your Application model.
If you do take the second or third approach, I would use a OneToOneField rather than a ForeignKey. It ensures that there are no 2 possible ApplicationMeta models for a single Application, and has the added benefit of a UNIQUE database index.
As for the status of an application, the NullBooleanField is designed for exactly that. It start as None (NULL in the db) meaning "no value". It can then be set to True (accepted) or False (rejected).

Django example of a form that submits multiple instances into a database?

I need a django form that submits multiple separate requests, and can't find an example of how to do this without a lot of customization. I.e., suppose there is a form that is used by a car repair shop. The form will list all the possible repairs that the shop is capable of doing, and the user will select which repairs they want to have done (i.e., using checkboes.)
Each repair can be assigned to a different mechanic. Each repair can also be cancelled or declared to be done, independent of the other repairs. That seems to require that each repair become a separate instance in a database.
Additionally, each repair job can be only performed by certain mechanic. So I need the ability to associate each repair job to it's own unique list of mechanics to choose from.
Has anyone seen an example of a django form, that does something like this? Thanks.

This is what formsets (and model formsets) are for.

It's been a while since the question is asked and I had the same problem:
I solved it by instance = form.save(commit=False), then setting the different attributes, then instances.save(force_insert=True), then deleting the form.instance.id....
HOWEVER this means that all fields that are eventually overwritten in the save method stay after the frist call to save()... This bit me hard!
How did you end up doing it?

What kind of validations should I use in my db models?

My form validators are pretty good, and if a form passes is_valid, all data should be ok to insert in the db. Should I still validate something on the db model? What else could there be validated on the db side? Because right now, except maybe for uniqueness ( which I can't do from my FormModel ), I can't think of anything else.
EDIT:
I did some work with Rails earlier, and there you would validate a form on the client side, using JS, and on the server side using model validations. I saw in django you can validate on the client side, using JS, and on the server side you have 2 validation checks: forms and models. This is what confused me.

All data should be validated in the database if possible whether you validate from the front end or not. The first validation should be the datatype, for instance using a date datatype will ensure that no nondates can ever get into your database. If you have relationships between tables these absolutely must be enforced at the database level. If the data must be unique, it is irresponsible to not put a unique index on it. If you have a distinct set of values that are the only ones allowed, then put them in a lookup table and add a forign key constraint to that table.
The reason why it is CRITICAL to do validations in the database itself is that the user interface will not be the only thing that interacts with the database (even if you think it will be). Other applications may do so, people will need to make data changes through imports or at a query window (to fix/change large amounts of data such as when client a buys client B and you need to convert all the data to client A). Also if you change the application interface you might lose the some of the critical data integrity checks in the rewrite. Data integrity is one of the most critical factors in database design and maintenance. If you can't count on data integrity, you have no data. I have never seen a database that lets this stuff be handled by the application that didn't lose data integrity over time. Remember the database will far outlast the current application. People will still be looking at this data for years to come. The application typically doesn't consider reporting which is where the data integrity problems tend to come to light. You don't want to have to explain why you have 10,000,000 in orders that you can't identify who they were shipped to, for instance.

If your data has a constraint that's always valid, you should force it in the model/database level (and optionally at the form level). Your DB can be input in multiple ways besides just a form where validation was checked. E.g., someone can go to the django shell to save models directly or someone could create/edit a model in the admin interface or some later designer creates a new form somehow, that doesn't validate correctly.
Granted this is only required if there are additional constraints on the data. Django automatically will validate for things like fields storing proper values, if you are using the correct field types. E.g., IntegerField validates to ensure it contains an integer, EmailField checks that its entered in the form of a valid email address, django.contrib.localflavor.us.models.PhoneNumberField is a US phone number, etc. Note, this only happens if your models have the proper fields (e.g., if you use CharFields for email addresses no validation can be performed.
But there may be other links between data structures, where you should write your own validation. E.g., if all custom orders requiring special instructions (and non-custom orders only sometimes have special instructions), you should check to enforce all custom orders have something in the special instructions field (and maybe have some minimum length).
EDIT: In response to your edit, the reason for three potential validations in django is straightforward -- different validations at different points for different reasons.
Client side (javascript/jquery) validation can't be trusted at all, and should only be given as a convenience for users almost as an afterthought (if you want a spiffy smooth interface). AFAIK, django doesn't have JS validation unless you use an external package like django-ajax-forms or something, but you don't trust that the validation is correct.
Second, there's a difference between form and model validation. One model may have multiple forms for different purposes. For example, you may have a blog with a Comment Model and allow two types of users to comment: signed in users, or anonymous users. The form for anonymous users may require giving a name/email before they comment, while the form for logged in users doesn't need those fields. The signed in user form, when processed in a view may automatically add the correct name and email addresses of the signed in user to the comment model before being saved.
In contrast, model validation always applies and will always be true at the database level, regardless of how they tried saving the data. If you want to make sure some condition always applies make sure it is at the DB level. (And you don't have to write put that validation in at the form level).

How can I easily mark records as deleted in Django models instead of actually deleting them?

Instead of deleting records in my Django application, I want to just mark them as "deleted" and have them hidden from my active queries. My main reason to do this is to give the user an undelete option in case they accidentally delete a record (these records may also be needed for certain backend audit tracking.)
There are a lot of foreign key relationships, so when I mark a record as deleted I'd have to "Cascade" this delete flag to those records as well. What tools, existing projects, or methods should I use to do this?

Warning: this is an old answer and it seems that the documentation is recommending not to do that now: https://docs.djangoproject.com/en/dev/topics/db/managers/#don-t-filter-away-any-results-in-this-type-of-manager-subclass
Django offers out of the box the exact mechanism you are looking for.
You can change the manager that is used for access through related objects. If you new custom manager filters the object on a boolean field, the object flagged inactive won't show up in your requests.
See here for more details :
http://docs.djangoproject.com/en/dev/topics/db/managers/#using-managers-for-related-object-access

Nice question, I've been wondering how to efficiently do this myself.
I am not sure if this will do the trick, but django-reversion seems to do what you want, although you probably want to examine to see how it achieves this goal, as there are some inefficient ways to do it.
Another thought would be to have the dreaded boolean flag on your Models and then creating a custom manager that automatically adds the filter in, although this wouldn't work for searches across different Models. Yet another solution suggested here is to have duplicate models of everything, which seems like overkill, but may work for you. The comments there also discuss different options.
I will add that for the most part I don't consider any of these solutions worth the hassle; I usually just suck it up and filter my searches on the boolean flag. It avoids many issues that can come up if you try to get too clever. It is a pain and not very DRY, of course. A reasonable solution would be a mixture of the Custom manager while being aware of its limitations if you try searching a related model through it.

I think using a boolean 'is_active' flag is fine - you don't need to cascade the flag to related entries at the db level, you just need to keep referring to the status of the parent. This is what happens with contrib.auth's User model, remember - marking a user as not is_active doesn't prompt django to go through related models and magically try to deactivate records, rather you just keep checking the is_active attribute of the user corresponding to the related item.
For instance if each user has many bookmarks, and you don't want an inactive user's bookmarks to be visible, just ensure that bookmark.user.is_active is true. There's unlikely to be a need for an is_active flag on the bookmark itself.

Here's a quick blog tutorial from Greg Allard from a couple of years ago, but I implemented it using Django 1.3 and it was great. I added methods to my objects named soft_delete, undelete, and hard_delete, which set self.deleted=True, self.deleted=False, and returned self.delete(), respectively.
A Django Model Manager for Soft Deleting Records and How to Customize the Django Admin

There are several packages which provide this functionality: https://www.djangopackages.com/grids/g/deletion/
I'm developing one https://github.com/meteozond/django-permanent/
It replaces default Manager and QuerySet delete methods to bring in logical deletion.
It completely shadows default Django delete methods with one exception - marks models which are inherited from PermanentModel instead of deletion, even if their deletion caused by relation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js