How to enforce uniqueness with NULL values in PostgreSQL - django

I have a model with a non-nullable CharField and 2 x nullable CharField:
class MyModel(models.Model):
name = models.CharField('Name', max_length=255, null=False)
title = models.CharField('Title', max_length=255, blank=True)
position = models.CharField('Position', max_length=255, blank=True)
I want to ensure that name, title, and position are unique together, and so use a UniqueConstraint:
def Meta:
constraints = [
models.UniqueConstraint(
fields=['name', 'title', 'position'],
name="unique_name_title_position"
),
]
However, if title is None then this constraint fails.
Looking into why, this is because you can insert NULL values into columns with the UNIQUE constraint because NULL is the absence of a value, so it is never equal to other NULL values and not considered a duplicate value. This means that it's possible to insert rows that appear to be duplicates if one of the values is NULL.
What's the correct way to strictly enforce this uniqueness in Django?

UniqueConstraint are not enforced on Django level but rather directly on database.
PostgreSQL for instance allows multiple Null records for unique while MSSQL does not
You can rather check if for each subset of columns with CheckConstraint but as you will find out this is tedious as it requires each combination to be uniquely indexed.
Rather to avoid even whole this mess you can follow Django guide for CharField
as documented
Avoid using null on string-based fields such as CharField and
TextField. If a string-based field has null=True, that means it has
two possible values for “no data”: NULL, and the empty string. In most
cases, it’s redundant to have two possible values for “no data;” the
Django convention is to use the empty string, not NULL. One exception
is when a CharField has both unique=True and blank=True set. In this
situation, null=True is required to avoid unique constraint violations
when saving multiple objects with blank values.
Empty string is checked for uniqueness

You could pick values that don't ever occur for title and position and use an index like this:
CREATE UNIQUE INDEX ON mytable (
name,
coalesce(title, '#impossible#'),
coalesce(position, '#impossible#'),
);
That will replace the NULL values with something else, so that duplicates are prevented.

Related

What is best practice for dealing with blank ('') string values on non-nullable model fields?

In Django/postgres if you have a non-nullable field, it's still possible to store a blank str value '' for certain field types (e.g. CharField, TextField).
Note I am not referring to blank=False, which simply determines if the field is rendered with required in a modelForm. I mean a literal blank string value.
For example, consider this simple model that has null=False on a CharField:
class MyModel(models.Model):
title = models.CharField('title', null=False)
The following throws an IntegrityError:
MyModel.objects.create(title=None)
Whilst the following does not, and creates the object:
MyModel.objects.create(title='')
Is this an unintended consequence of combining postgres with Django, or is it intended / what practical uses does it have?
If it's unintended, what's best practice to deal with this? Should every CharField and TextField with null=False also have a CheckConstraint? E.g.
Class Meta:
constraints = [
models.CheckConstraint(
check=~models.Q(title=''),
name='title_required'
)
]
From the Django docs:
https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.Field.null
"Avoid using null on string-based fields such as CharField and TextField. If a string-based field has null=True, that means it has two possible values for “no data”: NULL, and the empty string. In most cases, it’s redundant to have two possible values for “no data;” the Django convention is to use the empty string, not NULL. "
This is because there is common misconception out there that '' = NULL and Django decided to go with that. I personally think it was a bad decision but there it is. A string of length 0 is still a value as is an integer = 0.

Soft delete with unique constraint in Django

I have models with this layout:
class SafeDeleteModel(models.Model):
.....
deleted = models.DateTimeField(editable=False, null=True)
......
class MyModel(SafeDeleteModel):
safedelete_policy = SOFT_DELETE
field1 = models.CharField(max_length=200)
field2 = models.CharField(max_length=200)
field3 = models.ForeignKey(MyModel3)
field4 = models.ForeignKey(MyModel4)
field5 = models.ForeignKey(MyModel5)
class Meta:
unique_together = [['field2', 'field3', 'field4', 'deleted'],]
The scenario here is that I never want users to delete data. Instead a delete will just hide records. However, I still want all non-soft-deleted records to respect unique key constraints. Basically, I want to have as many duplicated deleted records, but only a single unique un-deleted record can exist. So I was thinking to include "deleted" field (provided by django-safedelete library), but the issue becomes that Django's unique checks fail with "psycopg2.IntegrityError: duplicate key value violates unique constraint" for ['field2', 'field3', 'field4', 'deleted'] because NULL is not "equal to" NULL and it yields false in PostgreSQL.
Is there a way to enforce a unique_together constraint with the Django model layout as mine? Or is there a better idea to physically delete the record, then move it to an archive database, and if the user wants the record back, then software will look for the record in the archive and recreate it?
Yes, as of Django version 2.2 it is possible to use a UniqueConstraint with a condition.
Have a look at the documentation in this link: https://docs.djangoproject.com/en/2.2/ref/models/constraints/#uniqueconstraint
So your model would be something like this:
class MyModel(SafeDeleteModel):
safedelete_policy = SOFT_DELETE
field1 = models.CharField(max_length=200)
field2 = models.CharField(max_length=200)
field3 = models.ForeignKey(MyModel3)
field4 = models.ForeignKey(MyModel4)
field5 = models.ForeignKey(MyModel5)
class Meta:
constraints = [
models.UniqueConstraint(
fields=['field2', 'field3', 'field4'],
condition=Q(deleted=False),
name='unique_if_not_deleted')
]
If you are using an older version of Django that doesn't have this feature available, you can create a migration with a partial unique index (have a look at this question here: Postgresql: Conditionally unique constraint).
As for your second question (would it be better to physically delete the record and move it elsewhere), it really depends on the characteristics of your application. If these soft-deletes don't happen very often and your table is still on the small side, I would keep the records in the same table for simplicity's sake, but if the number of records in the table starts growing fast and they affect the performance of the queries on this table then you should move the records elsewhere. You have to evaluate the trade-off between complexity and performance.

unique_together in Django doesn't work

unique_together doesn't work, it only set the unique constraints on the first field and ignore the second field. Is there any way to enforce unique constraints?
class BaseModel(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
deleted = models.DateTimeField(db_index=True, null=True, blank=True)
last_modified_at = models.DateTimeField(auto_now=True)
class Meta:
abstract = True
class Book(BaseModel):
first_form_number = models.CharField(max_length=8)
class Meta:
unique_together = (("first_form_number", "deleted"),)
Your models work correctly in that extent that the right unique index is created:
$ python manage.py sqlmigrate app 0001_initial
...
CREATE UNIQUE INDEX "app_base_slug_version_a455c5b7_uniq" ON "app_base" ("slug", "version");
...
(expected like the name of your application is "app")
I must roughly agree with user3541631's answer. It depends on the database in general, but all four db engines supported directly by Django are similar. They expect that "nulls are distinct in a UNIQUE column" (see NULL Handling in SQLite Versus Other Database Engines)
I verified your problem with and without null:
class Test(TestCase):
def test_without_null(self):
timestamp = datetime.datetime(2017, 8, 25, tzinfo=pytz.UTC)
book_1 = Book.objects.create(deleted=timestamp, first_form_number='a')
with self.assertRaises(django.db.utils.IntegrityError):
Book.objects.create(deleted=timestamp, first_form_number='a')
def test_with_null(self):
# this test fails !!! (and a duplicate is created)
book_1 = Book.objects.create(first_form_number='a')
with self.assertRaises(django.db.utils.IntegrityError):
Book.objects.create(first_form_number='a')
A solution is possible for PostgreSQL if you are willing to manually write a migration to create two special partial unique indexes:
CREATE UNIQUE INDEX book_2col_uni_idx ON app_book (first_form_number, deleted)
WHERE deleted IS NOT NULL;
CREATE UNIQUE INDEX book_1col_uni_idx ON app_book (first_form_number)
WHERE deleted IS NULL;
See:
Answer for Create unique constraint with null columns
Django docs Writing database migrations
Django docs migrations.RunSQL(sql)
depending on your database, it is possible that NULL isn't equal to any other NULL.
Therefore the rows you create are not the same, if one of the values is NULL, will be unique only by the non null field, in your case 'first_form_number'.
Also take in consideration that is case sensitive so "char" and "Char" are not the same.
I had a similar situation and I did my own check by overriding the save method on the model.
You check if exist in the database, but also exclude the current instance, in case of updating, not to compare with itself..
if not deleted:
exists = model.objects.exclude(pk=instance.pk).filter(first_form_number__iexact=first_form_number).exists()
Make sure you actually extend the inherited Meta class, rather than defining your own Meta class (which is ignored by Django):
class Meta(BaseModel.Meta):
unique_together = (("first_form_number", "deleted"),)

How do I filter for both a null and a specific value in the same field in Django?

So here's a field I have in one of my models:
year = models.ForeignKey(Year, blank=True, null=True)
Now I want to get all the objects of that model whose year field is either null or specific_year.
This is what I tried:
MyModel.objects.filter(year__isnull=True, year=specific_year)
MyModel.objects.filter(year__isnull=True).filter(year=specific_year)
MyModel.objects.filter(year__in=[specific_year, None])
but they all give me an empty result.
I am pretty new to Django, so maybe there is something I'm missing here, but I couldn't find an answer in the docs. So how can I filter for all the objects whose year field is either null or specific_year?
You can use Q objects for complex filtering
from django.db.models import Q
MyModel.objects.filter(Q(year__isnull=True) | Q(year=specific_year))
By default, using , or nested filtering would yield an AND result, whereas you need an OR result, for which you can use Q

How to write this class for Django's data model (converting from Propel's YML format)

I am converting a web project that currently uses the Propel ORM, to a django project.
My first task is to 'port' the model schema to django's.
I have read the django docs, but they do not appear to be in enough detail. Case in point, how may I 'port' a (contrived) table defined in the Propel YML schema as follows:
demo_ref_country:
code: { type: varchar(4), required: true, index: unique }
name: { type: varchar(64), required: true, index: unique }
geog_region_id: { type: integer, foreignTable: demo_ref_geographic_region, foreignReference: id, required: true, onUpdate: cascade, onDelete: restrict }
ccy_id: { type: integer, foreignTable: demo_ref_currency_def, foreignReference: id, required: true, onUpdate: cascade, onDelete: restrict }
flag_image_path: { type: varchar(64), required: true, default: ''}
created_at: ~
_indexes:
idx_f1: [geog_region_id, ccy_id, created_at]
_uniques:
idxu_f1_key: [code, geog_region_id, ccy_id]
Here is my (feeble) attempt so far:
class Country(models.Model):
code = models.CharField(max_length=4) # Erm, no index on this column .....
name = models.CharField(max_length=64) # Erm, no index on this column .....
geog_region_id = models.ForeignKey(GeogRegion) # Is this correct ? (how about ref integrity constraints ?
ccy_id = models.ForeignKey(Currency) # Is this correct?
flag_image_path = models.CharField(max_length=64) # How to set default on this col?
created_at = models.DateTimeField() # Will this default to now() ?
# Don't know how to specify indexes and unique indexes ....
[Edit]
To all those suggesting that I RTFM, I understand your frustration. Its just that the documentation is not very clear to me. It is probably a Pythonic way of documentation - but coming from a C++ background, I feel the documentation could be improved to make it more accesible for people coming from different languages.
Case in point: the documentation merely states the class name and an **options parameter in the ctor, but doesn't tell you what the possible options are.
For example class CharField(max_length=None,[**options])
There is a line further up in the documentation that gives a list of permissible options, which are applicable to all field types.
However, the options are provided in the form:
Field.optionname
The (apparently implicit) link between a class property and a constructor argument was not clear to me. It appears that if a class has a property foo, then it means that you can pass an argument named foo to its constructor. Does that observation hold true for all Python classes?
The indexes are automatically generated for your references to other models (i.e. your foreign keys). In other words: your geog_region_id is correct (but it would be better style to call it geog_region).
You can set default values using the default field option.
import datetime
class Country(models.Model):
code = models.CharField(max_length=4, unique=True)
name = models.CharField(max_length=64)
geog_region = models.ForeignKey(GeogRegion)
ccy = models.ForeignKey(Currency, unique=True)
flag_image_path = models.CharField(max_length=64, default='')
created_at = models.DateTimeField(default=datetime.now())
(I'm no expert on propel's orm)
Django always tries to imitate the "cascade on delete" behaviour, so no need to specify that somewhere. By default all fields are required, unless specified differently.
For the datetime field see some more options here. All general field options here.
code = models.CharField(max_length=4) # Erm, no index on this column .....
name = models.CharField(max_length=64) # Erm, no index on this column .....
You can pass the unique = True keyword argument and value for both of the above.
geog_region_id = models.ForeignKey(GeogRegion) # Is this correct ? (how about ref integrity constraints ?
ccy_id = models.ForeignKey(Currency) # Is this correct?
The above lines are correct if GeogRegion and Currency are defined before this model. Otherwise put quotes around the model names. For e.g. models.ForeignKey("GeogRegion"). See documentation.
flag_image_path = models.CharField(max_length=64) # How to set default on this col?
Easy. Use the default = "/foo/bar" keyword argument and value.
created_at = models.DateTimeField() # Will this default to now() ?
Not automatically. You can do default = datetime.now (remember to first from datetime import datetime). Alternately you can specify auto_now_add = True.
# Don't know how to specify indexes and unique indexes ....
Take a look at unique_together.
You'll see that the document I have linked to is the same pointed out by others. I strongly urge you to read the docs and work through the tutorial.
I'm sorry, you haven't read the docs. A simple search for index, unique or default on the field reference page reveals exactly how to set those options.
Edit after comment I don't understand what you mean about multiple lines. Python doesn't care how many lines you use within brackets - so this:
name = models.CharField(unique=True, db_index=True)
is exactly the same as this:
name = models.CharField(
unique=True,
db_index=True
)
Django doesn't support multi-column primary keys, but if you just want a multi-column unique constraint, see unique_together.
Class demo_ref_country(models.Model)
code= models.CharField(max_length=4, db_index=True, null=False)
name= models.CharField(max_length=64, db_index=True, null=False)
geog_region = models.ForeignKey(geographic_region, null=False)
ccy = models.ForeignKey(Currency_def, null=False)
flag = models.ImageField(upload_to='path to directory', null=False, default="home")
created_at = models.DateTimeField(auto_now_add=True, db_index=True)
class Meta:
unique_together = (code, geog_region, ccy)
You can set default values,, db_index paramaeter creates indexes for related fields. You can use unique=True for seperate fields, but tahat unique together will check uniqueness in columns together.
UPDATE: First of all, i advice you to read documentatin carefully, since django gives you a lot of opportunuties, some of them have some restrictions... Such as, unique_together option is used just for django admin. It means if you create a new record or edit it via admin interface, it will be used. If you will alsa insert data with other ways (like a DataModel.objects.create statement) its better you use uniaue=True in field definition like:
code= models.CharField(max_length=4, db_index=True, null=False, unique=True)
ForeignKey fields are unique as default, so you do not need to define uniqueness for them.
Django supports method override, so you can override Model save and delete methods as you like.
check it here. Django also allows you to write raw sql queries you can check it here
As i explained, unique together is a django admin feature. So dont forget to add unique=True to required fields.
Unique together also allows you to define diffrent unique pairs, such as;
unique_together = (('id','code'),('code','ccy','geog_region'))
That means, id and code must be unique together and code, ccy and geog_region must be unique together
UPDATE 2: Prior to your question update...
It is better yo start from tutorials. It defines basics with good examples.
As for doc style, let me give you an example, but if you start from tutors, it will be easier for you...
There are from model structure... Doc here
BooleanField
class BooleanField(**options)
that defines, the basic structure of a database field, () is used, and it has some parameters taken as options. that is the part:
models.BooleansField()
Since this is a field struvture, available options are defines as:
unique
Field.unique
So,
models.BooleansField(unique=True)
That is the general usage. Since uniqu is a basic option available to all field types, it classified as field.unique. There are some options available to a single field type, like symmetrical which is a ManyToMany field option, is classified as ManyToMany.Symmetrical
For the queryset
class QuerySet([model=None])
That is used as you use a function, but you use it to filter a model, with other words, write a filter query to execute... It has some methods, like filter...
filter(**kwargs)
Since this takes some kwargs, and as i told before, this is used to filter your query results, so kwargs must be your model fields (database table fields) Like:
MyModel.objects.filter(id=15)
what object is defines in the doc, but it is a manager that helps you get related objects.
Doc contains good examples, but you have to start from tutors, that is what i can advice you...