Django ORM, CharField and blank=True - django

Django's documentation is quite clear about storing empty strings as "" rather than NULL at a database level (so there is only one possible format for empty data):
Note that empty string values will always get stored as empty strings, not as NULL. Only use null=True for non-string fields such as integers, booleans and dates. For both types of fields, you will also need to set blank=True if you wish to permit empty values in forms, as the null parameter only affects database storage (see blank).
Nonetheless, after adding a new field, I've started encountering IntegrityErrors on the new field (phone_number).
null value in column "phone_number" violates non-null constraint
That model looks like this with the new field (I performed a migration via south):
class Person(models.Model):
user = models.ForeignKey(User)
description = models.TextField(blank=True)
phone_number = models.CharField(blank=True)
I've since (temporarily) resolved the issue by setting null=True on phone_number, but now I have hundreds of entries with empty strings, and a single NULL value in my database. (I also tried adding default='' to the phone_number field, but I was still seeing IntegrityError issues.)
In the past I've always used MySQL, but on this project I'm using Postgres. The generated SQL insert attempt is:
'INSERT INTO "people_person" ("user_id", "description", "gender", "birthdate", "default_image_id", "zip_code", "beta_status") VALUES (%s, %s, %s, %s, %s, %s, %s) RETURNING "people_person"."id"'.
My expectation would be that Django would be inserting a blank string into the "phone_number" column, but it doesn't appear to be doing so. The other thing I might expect would be Django to include a SET DEFAULT in the CREATE TABLE statement, but it doesn't. So Postgres gets angry about the NOT NULL on that column.
Thanks!

As is usually the case with problems that are so seemingly intractable, the issue at hand was user error.
My application had two entry points - two WSGI files, but only one code base. Normally, Apache will only reload your code if the file is touched. My deploy script was only touching one of those WSGI files - which meant that people reaching my site via the other WSGI file were still seeing old code. Worse, the database was modified under that old code, but the models were still as they were before.
This in turn caused the IntegrityError issues. Django didn't know about the phone_number field, so even though I had set blank=True, Django made no effort to insert a blank value - and the database of course thought that meant NULL.
This caused a series of different to track down errors, including the above error.
It's amazing how often really tough issues like these are caused by dumb minor omissions - like a deploy script I wrote 2 months ago and forgot to update.
Thanks for reading folks, I've upvoted the other answers, but I need to accept mine since it was ultimately the solution.

I discovered that if you explicitly set the field value to None you will still get these errors. In other words the default= thing is applied as soon as you create the python object, rather then when you save it to the database.
I guess that is reasonable but it was a bit unexpected.

Related

Django foreign key: auto-lookup related object when the update record has the key value

I have legacy code which had no foreign keys defined in the schema.
The raw data for the row includes the key value of the parent, naturally.
My first porting attempt to postgresql just updated the field with the raw value: I did not add foreign keys to Django's models.
Now I am trying to add foreign keys to make the schema more informative.
When I add a foreign key, django's update requires me to provide an instance of the parent object: I can no longer update by simply providing the key value. But this is onerous because now I need to include in my code knowledge of all the relations to go and fetch related objects, and have specific update calls per model. This seems crazy to me, at least starting from where I am, so I feel like I am really missing something.
Currently, the update code just pushes rows in blissful ignorance. The update code is generic for tables, which is easy when there are no relations.
Django's model data means that I can find the related object dynamically for any given model, and doing this means I can still keep very abstracted table update logic. So this is what I am thinking of doing. Or just doing raw SQL updates.
Does a solution to this already exist, even if I can't find it? I am expecting to be embarrassed.
The ValueError comes in django ORM code which knows exactly which model it expects and what the related field is: the missing step if to find the instance of related object.
db.models.fields.related_descriptors.py:
in this code, which throws the exception, value is supposed to be an instance of the parent model. Instead, value is the key value. This basically I think tells me how I can inspect the model to deal with this in advance, but I wonder if I am re-inventing the wheel.
if value is not None and not isinstance(value, self.field.remote_field.model._meta.concrete_model):
raise ValueError(
'Cannot assign "%r": "%s.%s" must be a "%s" instance.' % (
value,
instance._meta.object_name,
self.field.name,
self.field.remote_field.model._meta.object_name,
)
)
You could use _id suffix to set id value directly
For given model
class Album(models.Model):
artist = models.ForeignKey(Musician, on_delete=models.CASCADE)
You can set artist by id in following manner
Album.objects.create(artist_id=2)

Migrate a PositiveIntegerField to a FloatField

I have an existing populated database and would like to convert a PositiveIntegerField into a FloatField. I am considering simply doing a migration:
migrations.AlterField(
model_name='mymodel',
name='field_to_convert',
field=models.FloatField(
blank=True,
help_text='my helpful text',
null=True),
),
Where the field is currently defined as:
field_to_convert = models.PositiveIntegerField(
null=True,
blank=True,
help_text='my helpful text')
Will this require a full rewrite of the database column? How well might this conversion scale for larger databases? How might it scale if the vast majority values were null? In what circumstances would this conversion fail? This is a backed by a Postgres database if that makes a difference.
Will this require a full rewrite of the database column?
No, it won't. I did an experiment with PostgreSQL, MySQL, and SQLite the conversion from integer to float goes well in every case, I also put some values as null to match your situation.
If you have a value 3, it just will change to 3.0.
How might it scale if the vast majority values were null?
Well, since you keep null=True in the configuration of your field all null values will remain null, no problem with that. If you remove null=True you might need to specify a default value.
In what circumstances would this conversion fail?
Taking an int column and converting it to float (real) should not fail, if you find a bizarre, weird and very special case it would be a very big finding.
If you have doubts about the migration outcome...
... you can first take a look into migrations SQL with sqlmigrate, and of course, you could backup your database.
You can use sqlmigrate to check generated sql for your migration.
$ python manage.py sqlmigrate app_label migration_name
Keep in mind, that its output depends on the Django version and the database you have in settings. For the setup I had on hand (Django 1.11, Postgres 9.3) for your migration I got:
BEGIN;
--
-- Alter field field_to_convert on mymodel
--
ALTER TABLE "myapp_mymodel" DROP CONSTRAINT "myapp_mymodel_field_to_convert_check";
ALTER TABLE "myapp_mymodel" ALTER COLUMN "field_to_convert" TYPE double precision USING "field_to_convert"::double precision;
COMMIT;
Which looks good to me both in terms of performance and reliability. I'd say go ahead with the AlterField.
If you want to be extra safe, you can always go: rename field -> create field -> run python -> drop field. This will give you more control over the migration process. Check this answer for details.

Is there any way around saving models that reference each other twice?

My issue is when saving new models that need to reference each other, not just using a related_name lookup, such as this:
class Many:
owner = models.ForeignKey('One')
class One:
current = models.OneToOneField('Many')
By default, these have null=False and, please correct me if I'm wrong, using these are impossible until I change one of the relationships:
current = models.OneToOneField('Many', null=True)
The reason is because you can't assign a model to a relationship unless its already saved. Otherwise resulting in ValueError: 'Cannot assign "<...>": "..." instance isn't saved in the database.'.
But now when I create a pair of these objects I need to save twice:
many = Many()
one = One()
one.save()
many.owner = one
many.save()
one.current = many
one.save()
Is this the right way to do it, or is there another way around saving twice?
There is no way around it, you need to save one of the objects twice anyway.
This is because, at the database level, you need to save an object to get its ID. There is no way to tell a sql database "save those 2 objects and assign the ids to those fields on the other object". So if you were to do it manually, you would INSERT the first object with NULL for the FK, get its ID back, INSERT the second object with the ID of the first one, get its ID back, then UPDATE the first object to set the FK.
You would encapsulate the whole thing in a transaction.
So what you're doing with the ORM is the closest you can get. You may want to add the following on top of that:
1) Use a transaction for the changes, like this:
from django.db import transaction
with transaction.atomic():
many, one = Many(), One()
one.save()
many.owner = one
many.save()
one.current = many
one.save(update_fields=['current']) # slight optimization here
2) Now this is encapsulated in a transaction, you would want to remove the null=True. But you cannot, as those are, unfortunately, checked immediately.
[edit: it appears Oracle might support deferring the NOT NULL check, so if you're using Oracle you can try dropping the null=True and it should work.]
You'll probably want to check how your code reacts if at a later point, when reading the db, if for some reason (manual editing, bugged insert somewhere, ...) one.current.owner != one.

Do Django models really need a single unique key field

Some of my models are only unique in a combination of keys. I don't want to use an auto-numbering id as the identifier as subsets of the data will be exported to other systems (such as spreadsheets), modified and then used to update the master database.
Here's an example:
class Statement(models.Model):
supplier = models.ForeignKey(Supplier)
total = models.DecimalField("statement total", max_digits=10, decimal_places=2)
statement_date = models.DateField("statement date")
....
class Invoice(models.Model):
supplier = models.ForeignKey(Supplier)
amount = models.DecimalField("invoice total", max_digits=10, decimal_places=2)
invoice_date = models.DateField("date of invoice")
statement = models.ForeignKey(Statement, blank=True, null=True)
....
Invoice records are only unique for a combination of supplier, amount and invoice_date
I'm wondering if I should create a slug for Invoice based on supplier, amount and invoice_date so that it is easy to identify the correct record.
An example of the problem of having multiple related fields to identify the right record is django-csvimport which assumes there is only one related field and will not discriminate on two when building the foreign key links.
Yet the slug seems a clumsy option and needs some kind of management to rebuild the slugs after adding records in bulk.
I'm thinking this must be a common problem and maybe there's a best practice design pattern out there somewhere.
I am using PostgreSQL in case anyone has a database solution. Although I'd prefer to avoid that if possible, I can see that it might be the way to build my slug if that's the way to go, perhaps with trigger functions. That just feels a bit like hidden functionality though, and may cause a headache for setting up on a different server.
UPDATE - after reading initial replies
My application requires that data may be exported, modified remotely, and merged back into the master database after review and approval. Hidden autonumber keys don't easily survive that consistently. The relation invoices[2417] is part of statements[265] is not persistent if the statement table was emptied and reloaded from a CSV.
If I use the numeric autonumber pk then any process that is updating the database would need to refresh the related key numbers or by using the multiple WITH clause.
If I create a slug that is based on my 3 keys but easy to reproduce then I can use it as the key - albeit clumsily. I'm thinking of a slug along the lines:
u'%s %s %s' % (self.supplier,
self.statement_date.strftime("%Y-%m-%d"),
self.total)
This seems quite clumsy and not very DRY as I expect I may have to recreate the slug elsewhere duplicating the algorithm (maybe in an Excel formula, or an Access query)
I thought there must be a better way I'm missing but it looks like yuvi's reply means there should be, and there will be, but not yet :-(
What you're talking about it a multi-column primary key, otherwise known as "composite" or "compound" keys. Support in django for composite keys today is still in the works, you can read about it here:
Currently Django models only support a single column in this set,
denying many designs where the natural primary key of a table is
multiple columns [...] Current state is that the issue is
accepted/assigned and being worked on [...]
The link also mentions a partial implementation which is django-compositekeys. It's only partial and will cause you trouble with navigating between relationships:
support for composite keys is missing in ForeignKey and
RelatedManager. As a consequence, it isn't possible to navigate
relationships from models that have a composite primary key.
So currently it isn't entirely supported, but will be in the future. Regarding your own project, you can make of that what you will, though my own suggestion is to stick with the fully supported default of a hidden auto-incremented field that you don't even need to think about (and use unique_together to enforce the uniqness of the described fields instead of making them your primary keys).
I hope this helps!
No.
Model needs to have one field that is primary_key = True. By default this is the (hidden) autofield which stores object Id. But you can set primary_key to True at any other field.
I've done this in cases, Where i'm creating django project upon tables which were previously created manually or through some other frameworks/systems.
In reality - you can use whatever means you can think of, for joining objects together in queries. As long as query returns bunch of data that can be associated with models you have - it does not really matter which field you are using for joins. Just keep in mind, that the solution you use should be as effective as possible.
Alan

Django model field max_length 3x varchar size?

I just realized that with a legacy table I'm using in a django app that a varchar(5) field (for example) is rendered in python as a models.CharField(max_length=15) field. This 3x size for the max length is very consistent across many other fields.
Why? or more importantly if I changed the django definition to be models.CharField(max_length=5) would I break anything?
It is probably a manual error by someone who tried to write models.
No. It doesn't break anything if you change it to 5. Not only that, you should change it to 5, so your form validation itself will take care of that length where you have that field.