Migrate a PositiveIntegerField to a FloatField - django

I have an existing populated database and would like to convert a PositiveIntegerField into a FloatField. I am considering simply doing a migration:
migrations.AlterField(
model_name='mymodel',
name='field_to_convert',
field=models.FloatField(
blank=True,
help_text='my helpful text',
null=True),
),
Where the field is currently defined as:
field_to_convert = models.PositiveIntegerField(
null=True,
blank=True,
help_text='my helpful text')
Will this require a full rewrite of the database column? How well might this conversion scale for larger databases? How might it scale if the vast majority values were null? In what circumstances would this conversion fail? This is a backed by a Postgres database if that makes a difference.

Will this require a full rewrite of the database column?
No, it won't. I did an experiment with PostgreSQL, MySQL, and SQLite the conversion from integer to float goes well in every case, I also put some values as null to match your situation.
If you have a value 3, it just will change to 3.0.
How might it scale if the vast majority values were null?
Well, since you keep null=True in the configuration of your field all null values will remain null, no problem with that. If you remove null=True you might need to specify a default value.
In what circumstances would this conversion fail?
Taking an int column and converting it to float (real) should not fail, if you find a bizarre, weird and very special case it would be a very big finding.
If you have doubts about the migration outcome...
... you can first take a look into migrations SQL with sqlmigrate, and of course, you could backup your database.

You can use sqlmigrate to check generated sql for your migration.
$ python manage.py sqlmigrate app_label migration_name
Keep in mind, that its output depends on the Django version and the database you have in settings. For the setup I had on hand (Django 1.11, Postgres 9.3) for your migration I got:
BEGIN;
--
-- Alter field field_to_convert on mymodel
--
ALTER TABLE "myapp_mymodel" DROP CONSTRAINT "myapp_mymodel_field_to_convert_check";
ALTER TABLE "myapp_mymodel" ALTER COLUMN "field_to_convert" TYPE double precision USING "field_to_convert"::double precision;
COMMIT;
Which looks good to me both in terms of performance and reliability. I'd say go ahead with the AlterField.
If you want to be extra safe, you can always go: rename field -> create field -> run python -> drop field. This will give you more control over the migration process. Check this answer for details.

Related

Refactoring: how to remove a model?

I have a model which is causing too much complexity, and so I want to do away with it and move to a simpler way of doing things. I don't immediately want to scrap the data in this database table, though.
class PRSblock( models.Model):
PRS = models.ForeignKey( 'jobs.PRS2', models.CASCADE, related_name='prs_blocks')
# no other relational fields
So, first, migrate the related name prs_blocks to obsolete_prs_blocks and then in the PRS model, add a #property prs_blocks that will assert that it is never called (to trap any bits of code which I failed to remove)
Second, rename the model PRSblock to obsolete_PRSblock. IIRC Django makemigrations will ask whether I renamed it, and if I say yes, it will preserve the database table.
Does this sound sensible or are there any gotchas I haven't though of?

Best index for a Django model when filtering on one field and ordering on another field

I use Django 2.2 linked to PostgreSQL and would like to optimise my database queries.
Given the following simplified model:
class Person(model.Models):
name = models.CharField()
age = models.Integerfield()
on which I have to do the following query, say,
Person.objects.filter(age__gt=20, age__lt=30).order_by('name')
What would be the best way to define the index in the model Meta field so as to optimise the query?
Which of these four options would be best?
class Meta
indexes = [models.Index(fields=['age','name']),
models.Index(fields=['name','age']),
models.Index(fields=['name']),
models.Index(fields=['age'])]
Is it, for example, possible to prevent sorting when the query is done? Thank you.
This is really a postgres question, as much as a Django question, right?
I think there is a good chance that creating an index on your sort field will help with performance. But there are a lot of caveats and if it's really important to you, you might want to do some testing focused on Postgres (ie, just run some queries in psql and see what happens). Some caveats include:
it might depend on which type of index is created for you by Django
Postgres, of course, does not always use index anyway when running a query but it should if you've got the right one and the right query (and if there is enough data in the table to justify loading the index)
it might matter how your SELECT is formatted by Django
I suggest you create your model and specify that you want the index. Then use Django Debug Toolbar to find out what SELECT query is really getting run. Then, open a dbshell with manage.py dbshell (aka psql) and run ANALYZE with that same select. Assuming you can interpret the output, you will see for yourself whether your index is coming in to play. Paste the ANALYZE output here, if you like.
According to this Postgres documentation ORDER BY can be assisted by a btree index. The b-tree type of index is what Django will create for you by default.
So, why don't you try this:
class Meta:
indexes = [models.Index(fields=['age', 'name'])]
Then go run an EXPLAIN ANALYZE in dbshell and see whether it worked.
# You should apply indexing on age, because you are searching for 'age' column data
indexes = [
models.Index(fields=['age'])
]

South Data migration in django after modifing the model

I have a project with existing class model as
class Disability(models.Model):
child = models.ForeignKey(Child,verbose_name=_("Child"))
But with the recent architecture change i have to modify it as
class Disability(models.Model):
child = models.ManyToManyField(Child,verbose_name=_("Child"))
now for this new change .. ( even i have to modify the existing database for the new one )
i guess data migration is the best way to do it rather than doing it manually .
i refered this online doc
http://south.readthedocs.org/en/latest/commands.html#commands-datamigration
but it has very less about data migration . and more about schema migration .
question 1 . if i do the schema migration will this make me loose all me previous data belonging to that old model.
question 2 . Even i am tring for schema migartion it is asking this ...
(rats)rats#Inspiron:~/profoundis/kenyakids$ ./manage.py schemamigration web --auto
? The field 'Disability.child' does not have a default specified, yet is NOT NULL.
? Since you are removing this field, you MUST specify a default
? value to use for existing rows. Would you like to:
? 1. Quit now, and add a default to the field in models.py
? 2. Specify a one-off value to use for existing columns now
? 3. Disable the backwards migration by raising an exception.
? Please select a choice: 1
Can anyone Explain the concept and difference between schema and data migration and how this can be achieved separately .
Schema and data migrations are not different options you can take to modify your table structure. They are completely different things. Of course, data migrations are fully described in the South docs.
Here a data migration will not help you, because you need to modify your schema. And the whole point of South and other migration systems is that they allow you to do that without losing data.
South will try to do a transaction by moving your table data to a temporary table (I could be wrong there), then restructure the table and try to add in the origin data to the new strucutre. Like this:
old_table -> clone -> tmp_table
old_table ->restructure
tmp_table.data -> table
South will look at the field types. If there is big changes it will ask what to do. For example chaning a text field to a int field would be very hard to convert :)
When you remove fields you may still want to be able to convert back to an old structure, so south will need some default data to be able to create a table with the old structure.
Moving data is always an issue since you may change table structure and field type. For example how would you manually deal with data from a Char(max_length=100) to a Char(max_length=50)?
Best suggestion is to keep good backups.
Also take advantage of djangos fixtures. You can save fixtures for different datastructures along with south migrations.
South will load initial_data files in the same way as syncdb, but it
loads them at the end of every successful migration process
http://south.readthedocs.org/en/latest/commands.html#initial-data-and-post-syncdb

Django query optimization

I have a model with a lot of entries (more than 12,513,262) and they are supposed to increase exponentially. But the problem is due to this large no. entries querying is taking a lot of time is there a way to increase performance using indices etc.
My query is like this:
MyModel.objects.all().order_by('-timestamp')[0:50]
and its taking a lot of time to execute.
do you have an index on the timestamp field?
if you use south (for database migrations, you should definitely look into it if you aren't already), you can just add db_index=True to your field and migrate. Otherwise you can run
./manage.py sqlindexes MyApp
to show the sql statement adding the index. (which you need to run manually, e.g. using
./manage.py dbshell

Django ORM, CharField and blank=True

Django's documentation is quite clear about storing empty strings as "" rather than NULL at a database level (so there is only one possible format for empty data):
Note that empty string values will always get stored as empty strings, not as NULL. Only use null=True for non-string fields such as integers, booleans and dates. For both types of fields, you will also need to set blank=True if you wish to permit empty values in forms, as the null parameter only affects database storage (see blank).
Nonetheless, after adding a new field, I've started encountering IntegrityErrors on the new field (phone_number).
null value in column "phone_number" violates non-null constraint
That model looks like this with the new field (I performed a migration via south):
class Person(models.Model):
user = models.ForeignKey(User)
description = models.TextField(blank=True)
phone_number = models.CharField(blank=True)
I've since (temporarily) resolved the issue by setting null=True on phone_number, but now I have hundreds of entries with empty strings, and a single NULL value in my database. (I also tried adding default='' to the phone_number field, but I was still seeing IntegrityError issues.)
In the past I've always used MySQL, but on this project I'm using Postgres. The generated SQL insert attempt is:
'INSERT INTO "people_person" ("user_id", "description", "gender", "birthdate", "default_image_id", "zip_code", "beta_status") VALUES (%s, %s, %s, %s, %s, %s, %s) RETURNING "people_person"."id"'.
My expectation would be that Django would be inserting a blank string into the "phone_number" column, but it doesn't appear to be doing so. The other thing I might expect would be Django to include a SET DEFAULT in the CREATE TABLE statement, but it doesn't. So Postgres gets angry about the NOT NULL on that column.
Thanks!
As is usually the case with problems that are so seemingly intractable, the issue at hand was user error.
My application had two entry points - two WSGI files, but only one code base. Normally, Apache will only reload your code if the file is touched. My deploy script was only touching one of those WSGI files - which meant that people reaching my site via the other WSGI file were still seeing old code. Worse, the database was modified under that old code, but the models were still as they were before.
This in turn caused the IntegrityError issues. Django didn't know about the phone_number field, so even though I had set blank=True, Django made no effort to insert a blank value - and the database of course thought that meant NULL.
This caused a series of different to track down errors, including the above error.
It's amazing how often really tough issues like these are caused by dumb minor omissions - like a deploy script I wrote 2 months ago and forgot to update.
Thanks for reading folks, I've upvoted the other answers, but I need to accept mine since it was ultimately the solution.
I discovered that if you explicitly set the field value to None you will still get these errors. In other words the default= thing is applied as soon as you create the python object, rather then when you save it to the database.
I guess that is reasonable but it was a bit unexpected.