I have migrated over 10,000 records from my old mySQL database to Django/sqlite. In my old mysql schema's Song table, the artist field was not a 1 to many field but was just a mysql varchar field. In my new Django model, I converted the artist field to a ForeignKey and used temp_artist to temporarily store the artist's name from the old database.
How do I create each Song instance's artist foreignkey based on the temp_artist field? I'm assuming I should use the manager's get_or_create method but where and how do I write the code?
my model below:
class Artist (models.Model):
name = models.CharField(max_length=100)
class Song (models.Model):
artist = models.ForeignKey(Artist, blank=True, null=True, on_delete=models.CASCADE, verbose_name="Artist")
temp_artist = models.CharField(null=True, blank=True, max_length=100)
title = models.CharField(max_length=100, verbose_name="Title")
duration = models.DurationField(null=True, blank=True, verbose_name="Duration")
You can write a custom management command that performs this logic for you. The docs provide good instructions on how to set it up. Your command code would look something like this:
# e.g., migrateauthors.py
from django.core.management.base import BaseCommand
from myapp import models
class Command(BaseCommand):
help = 'Migrate authors from old schema'
def handle(self, *args, **options):
for song in myapp.models.Song.objects.all():
song.artist, _ = models.Artist.objects.get_or_create(name=song.temp_artist)
song.save()
Then you simply run the management command with manage.py migrateauthors. Once this is done and verified you can remove the temporary field from your model.
Since you don't have a usable foreign key at the moment you would have to dig down to raw_sql. If you were still on mysql you could have used the UPDATE JOIN syntax. But unfortunately Sqlite does not support UPDATE JOIN.
Luckily for you you have only a few thousand rows and that makes it possible to iterate through them and update each row individually.
raw_query = '''SELECT s.*, a.id as fkid
FROM myapp_song s
INNER JOIN myapp_artist a on s.temp_artist = a.name'''
for song in Song.objects.raw(raw_query)
song.artist_id = s.fkid
song.save()
This might take a few minutes to complete because you don't have an index on temp_artist and name. Take care to replace myapp with the actual name of your app.
Edit1:
Though Sqlite doesn't have update JOIN, it does allow you to SET a value with a subquery. So this will also work.
UPDATE myapp_song set artist_id =
(SELECT id from myapp_artist WHERE name = myapp_song.temp_artist)
type it in the sqlite console or GUI. Make sure to replace myapp with your own app name. This will be very quick because it's a single query. All other solutions including my alternative solution in this answer involve 10,000 queries.
Edit 2
If your Artist table is empty at the moment, before you do all this you will have to populate it, here is an easy query that does it
INSERT INTO stackoverflow_artist(name)
SELECT distinct temp_artist from stackoverflow_song
note that you should have a unique index on Artist.name
Related
I've been trying to figure this one out for a while now but am confused. Every ManyToMany relationship always goes through a third table which isn't that difficult to understand. But in the event that the third table is a custom through table with additional fields how do you grab the custom field for each row?
Here's a sample table I made. How can I get all the movies a User has watched along with the additional watched field and finished field? This example assumes the user is only allowed to see the movie once whether they finish it or not so there will only be 1 record for each movie they saw.
class Movie(models.Model):
title = models.CharField(max_length=191)
class User(models.Model):
username = models.CharField(max_length=191)
watched = models.ManyToMany(Movie, through='watch')
class Watch(models.Model):
user = models.Foreignkey(User, on_delete=models.CASCADE)
movie = models.Foreignkey(Movie, on_delete=models.CASCADE)
watched = models.DateTimeField()
finished = models.BooleanField()
Penny for your thoughts my friends.
You can uses:
from django.db.models import F
my_user.watched.annotate(
watched=F('watch__watched'),
finished=F('watch__finished')
)
This will return a QuerySet of Movies that contain as extra attributes .watched and .finished.
That being said, it might be cleaner to just access the watch_set, and thus iterate over the Watch objects and access the .movie object for details about the movie. You can use .select_related(..) [Django-doc] to fetch the information about the Movies in the same database query:
for watch in my_user.watch_set.select_related('movie'):
print(f'{watch.movie.title}: {watch.watched}, {watch.finished}')
I have a django project and I have a Post model witch look like that:
class BasicPost(models.Model):
author = models.ForeignKey('auth.User', on_delete=models.CASCADE)
published = models.BooleanField(default=False)
created_date = models.DateTimeField(auto_now_add=True)
title = models.CharField(max_length=100, blank=False)
body = models.TextField(max_length=999)
media = models.ImageField(blank=True)
def get_absolute_url(self):
return reverse('basic_post', args=[str(self.pk)])
def __str__(self):
return self.title
Also, I use the basic User model that comes with the basic django app.
I want to save witch posts each user has read so I can send him posts he haven't read.
My question is what is the best way to do so, If I use Many to Many field, should I put it on the User model and save all the posts he read or should I do it in the other direction, put the Many to Many field in the Post model and save for each post witch user read it?
it's going to be more that 1 million + posts in the Post model and about 50,000 users and I want to do the best filters to return unread posts to the user
If I should use the first option, how do I expand the User model?
thanks!
On your first question (which way to go): I believe that ManyToMany by default creates indices in the DB for both foreign keys. Therefore, wherever you put the relation, in User or in BasicPost, you'll have the direct and reverse relationships working through an index. Django will create for you a pivot table with three columns like: (id, user_id, basic_post_id). Every access to this table will index through user_id or basic_post_id and check that there's a unique couple (user_id, basic_post_id), if any. So it's more within your application that you'll decide whether you filter from a 1 million set or from a 50k posts.
On your second question (how to overload User), it's generally recommended to subclass User from the very beginning. If that's too late and your project is too far advanced for that, you can do this in your models.py:
class BasicPost(models.Model):
# your code
readers = models.ManyToManyField(to='User', related_name="posts_already_read")
# "manually" add method to User class
def _unread_posts(user):
return BasicPost.objects.exclude(readers__in=user)
User.unread_posts = _unread_posts
Haven't run this code though! Hope this helps.
Could you have a separate ReadPost model instead of a potentially large m2m, which you could save when a user reads a post? That way you can just query the ReadPost models to get the data, instead of storing it all in the blog post.
Maybe something like this:
from django.utils import timezone
class UserReadPost(models.Model):
user = models.ForeignKey("auth.User", on_delete=models.CASCADE, related_name="read_posts")
seen_at = models.DateTimeField(default=timezone.now)
post = models.ForeignKey(BasicPost, on_delete=models.CASCADE, related_name="read_by_users")
You could add a unique_together constraint to make sure that only one UserReadPost object is created for each user and post (to make sure you don't count any twice), and use get_or_create() when creating new records.
Then finding the posts a user has read is:
posts = UserReadPost.objects.filter(user=current_user).values_list("post", flat=True)
This could also be extended relatively easily. For example, if your BasicPost objects can be edited, you could add an updated_at field to the post. Then you could compare the seen_at of the UserReadPost field to the updated_at field of the BasicPost to check if they've seen the updated version.
Downside is you'd be creating a lot of rows in the DB for this table.
If you place your posts in chronological order (by created_at, for example), your option could be to extend user model with latest_read_post_id field.
This case:
class BasicPost(models.Model):
# your code
def is_read_by(self, user):
return self.id < user.latest_read_post_id
I'm facing a big issue with django.
I'm trying to save object containing foreignKeys and 'ManyToMany` but i always get this error
ProgrammingError: column [columnName] does not exist
I've made serveral times all migrations but it doesn't works. I have no problem when i work with models that does not contain foreign keys. I have tried to delete the migration folder. It's seems my database doesn't want to update fields. I need to force it to create these column but i don't have any idea.
class Post(models.Model):
post_id = models.CharField(max_length=100,default="")
title = models.CharField(max_length=100,default="")
content = models.TextField(default="")
author = models.ForeignKey(Users, default=None, on_delete=models.CASCADE)
comments = models.ManyToManyField(Replies)
numberComments = models.IntegerField(default=0)
date = models.DateTimeField(default=timezone.now)
updated = models.DateTimeField(null=True)
def __str__(self):
return self.post_id
when i'm trying to retrieve this i have :
ProgrammingError: column numberComments does not exist
As i said before i made makemigrations and migrate, i even deleted the migration folder.
Any idea ?
To save a instance of the POST model with foreign key you need to insert the query object.
Code example:
user = Users.objects.get(pk = 1)
p = POST(
title = 'Hello',
...
author = user,
date = '2018-01-01'
)
p.save()
You don't need to create post_id column, django creates one for you automatically, and you can access that using .pk or .id
You neither need numberComments. You should calculate that from comments many to many relation. Well... you can have this on DB too.
Next, you cannot add a many to many relation on creation. Create the post first as above. Then query the comment you want to add, the add the object to the relation
r = Replies.objects.get(pk = 1)
p.comments.add(r)
Hope it helps
I need to change the type of a field in one of my Django models from CharField to ForeignKey. The fields are already populated with data, so I was wondering what is the best or right way to do this. Can I just update the field type and migrate, or are there any possible 'gotchas' to be aware of? N.B.: I just use vanilla Django management operations (makemigrations and migrate), not South.
This is likely a case where you want to do a multi-stage migration. My recommendation for this would look something like the following.
First off, let's assume this is your initial model, inside an application called discography:
from django.db import models
class Album(models.Model):
name = models.CharField(max_length=255)
artist = models.CharField(max_length=255)
Now, you realize that you want to use a ForeignKey for the artist instead. Well, as mentioned, this is not just a simple process for this. It has to be done in several steps.
Step 1, add a new field for the ForeignKey, making sure to mark it as null:
from django.db import models
class Album(models.Model):
name = models.CharField(max_length=255)
artist = models.CharField(max_length=255)
artist_link = models.ForeignKey('Artist', null=True)
class Artist(models.Model):
name = models.CharField(max_length=255)
...and create a migration for this change.
./manage.py makemigrations discography
Step 2, populate your new field. In order to do this, you have to create an empty migration.
./manage.py makemigrations --empty --name transfer_artists discography
Once you have this empty migration, you want to add a single RunPython operation to it in order to link your records. In this case, it could look something like this:
def link_artists(apps, schema_editor):
Album = apps.get_model('discography', 'Album')
Artist = apps.get_model('discography', 'Artist')
for album in Album.objects.all():
artist, created = Artist.objects.get_or_create(name=album.artist)
album.artist_link = artist
album.save()
Now that your data is transferred to the new field, you could actually be done and leave everything as is, using the new field for everything. Or, if you want to do a bit of cleanup, you want to create two more migrations.
For your first migration, you will want to delete your original field, artist. For your second migration, rename the new field artist_link to artist.
This is done in multiple steps to ensure that Django recognizes the operations properly. You could create a migration manually to handle this, but I will leave that to you to figure out.
Adding on top of Joey's answer, detailed steps for Django 2.2.11.
Here are the models from my use case, that consists of a Company and Employee model. We have to convert designation to a foreign key field. The app name is called core
class Company(CommonFields):
name = models.CharField(max_length=255, blank=True, null=True
class Employee(CommonFields):
company = models.ForeignKey("Company", on_delete=models.CASCADE, blank=True, null=True)
designation = models.CharField(max_length=100, blank=True, null=True)
Step 1
Create a foreign key designation_link in Employee and mark it as null=True
class Designation(CommonFields):
name = models.CharField(max_length=255)
company = models.ForeignKey("Company", on_delete=models.CASCADE, blank=True, null=True)
class Employee(CommonFields):
company = models.ForeignKey("Company", on_delete=models.CASCADE, blank=True, null=True)
designation = models.CharField(max_length=100, blank=True, null=True)
designation_link = models.ForeignKey("Designation", on_delete=models.CASCADE, blank=True, null=True)
Step 2
Create empty migration. Using the command:
python app_code/manage.py makemigrations --empty --name transfer_designations core
This will create a following file in migrations directory.
# Generated by Django 2.2.11 on 2020-04-02 05:56
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
('core', '0006_auto_20200402_1119'),
]
operations = [
]
Step 3
Populate the empty migration with a function that loops over all Employees, creates a Designation and links it to the Employee.
In my use case each Designation is also linked to a Company. Which means that Designation may contain two rows for "managers", one for company A, another for company B.
Final migration would look something like this:
# core/migrations/0007_transfer_designations.py
# Generated by Django 2.2.11 on 2020-04-02 05:56
from django.db import migrations
def link_designation(apps, schema_editor):
Employee = apps.get_model('core', 'Employee')
Designation = apps.get_model('core', 'Designation')
for emp in Employee.objects.all():
if(emp.designation is not None and emp.company is not None):
desig, created = Designation.objects.get_or_create(name=emp.designation, company=emp.company)
emp.designation_link = desig
emp.save()
class Migration(migrations.Migration):
dependencies = [
('core', '0006_auto_20200402_1119'),
]
operations = [
migrations.RunPython(link_designation),
]
Step 4
Finally run this migration using:
python app_code/manage.py migrate core 0007
That's a continuation of the great answer by Joey.
How to rename the new field to the original name?
If the field has data, it probably means that you are using it elsewhere in your project, therefore this solution will leave you with a field named differently, and you have to either refactor the project to use the new field or delete the old field and rename the new one.
Be aware that this process is not going to prevent you to refactor code. If you where using a CharField with CHOICES, you were accessing its content with get_filename_display(), for example.
If you try to delete the field to make a migration, for then renaming the other field and make another migration, you'll see Django complaining because you cannot delete a field that you are using in the project.
Just create an empty migration as Joey explained, and put this in operations:
operations = [
migrations.RemoveField(
model_name='app_name',
name='old_field_name',
),
migrations.RenameField(
model_name='app_name',
old_name='old_field_name_link',
new_name='old_field_name',
),
]
Then run migrate and you'll have the changes made in your database, but obviously not in your model, it's time now to delete the old field and to rename new ForeignKey field to the original name.
I don't think that doing this is particularly hacky, but still, only do this kind of things if you are fully understanding what are you messing with.
I've googled on and on, and I just don't seem to get it.
How do I recreate simple join queries in django?
in models.py (Fylker is county, Dagensrepresentanter is persons)
class Fylker(models.Model):
id = models.CharField(max_length=6, primary_key=True)
navn = models.CharField(max_length=300)
def __unicode__(self):
return self.navn
class Meta:
db_table = u'fylker'
class Dagensrepresentanter(models.Model):
id = models.CharField(max_length=33, primary_key=True)
etternavn = models.CharField(max_length=300, blank=True)
fornavn = models.CharField(max_length=300, blank=True)
fylke = models.ForeignKey(Fylker, db_column='id')
def __unicode__(self):
return u'%s %s' % (self.fornavn, self.etternavn)
class Meta:
ordering = ['etternavn'] # sette default ordering
db_table = u'dagensrepresentanter'
Since the models are auto-created by django, I have added the ForeignKey and tried to connect it to the county. The id fields are inherited from the db I'm trying to integrate into this django project.
By querying
Dagensrepresentanter.objects.all()
I get all the people, but without their county.
By querying
Dagensrepresentanter.objects.all().select_related()
I get a join on Dagensrepresentanter.id and Fylker.id, but I want thet join to be on fylke, aka
SELECT * FROM dagensrepresentanter d , fylker f WHERE d.fylke = f.id
This way I'd get the county name (Fylke navn) in the same resultset as all the persons.
Additional request:
I've read over the django docs and quite a few questions here at stackoverflow, but I can't seem to get my head around this ORM thing. It's the queries that hurt. Do you have any good resources (blogposts with experiences/explanations, etc.) for people accustomed to think of databases as an SQL-thing, that needs to start thinking in django ORM terms?
Your legacy database may not have foreign key constraints (for example, if it is using MyISAM then foreign keys aren't even supported).
You have two choices:
Add foreign key constraints to your tables (would involve upgrading to Innodb if you are on MyISAM). Then run ./manage inspectdb again and the relationships should appear.
Use the tables as is (i.e., with no explicit relationships between them) and compose queries manually (e.g., Mytable.objects.get(other_table_id=23)) either at the object level or through writing your own SQL queries. Either way, you lose much of the benefit of python's ORM query language.