Prefetching extra fields in a ManyToMany table - django

I am working with Django on a database that has additional fields on intermediate models. Since it's a big database, I try to optimize the way the data is loaded. But I have a problem with the extra fields of the association table.
Let's take this example from Django's documentation :
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=128)
def __str__(self):
return self.name
class Group(models.Model):
name = models.CharField(max_length=128)
members = models.ManyToManyField(Person, through='Membership')
def __str__(self):
return self.name
class Membership(models.Model):
person = models.ForeignKey(Person, on_delete=models.CASCADE)
group = models.ForeignKey(Group, on_delete=models.CASCADE)
date_joined = models.DateField()
invite_reason = models.CharField(max_length=64)
I would like to retrieve, from each entity of the Group class, all the entities of the Person class and all the fields invite_reason or date_joined.
To retrieve the persons, it goes fast with the QuerySet.prefetch_related attribute that prevents the deluge of database queries that is caused by accessing related objects.
groups = Group.objects.prefetch_related('members')
However, I did not find a solution to retrieve in a constant access time the extra fields invite_reason and date_joined.
I tried prefetching membership_set or a related name in my variable groups but my code doesn't go faster.
# NOT WORKING
groups = Group.objects.prefetch_related('members', 'membership_set')
I also tried using a Prefetch object with a queryset parameter using select_related but it didn't work. Everything I've tried to load all the Membership data into groups at initialization has failed and I end up having a very long runtime retrieving the extra fields from the table.
# TAKES A WHILE BECAUSE NOTHING IS PREFETCHED
for group in groups:
invite_reason_list = group.membership_set.values_list('invite_reason', flat=True)
date_joined_list = group.membership_set.values_list('date_joined', flat=True)
How do I stop the deluge of database queries that is caused by accessing related objects?

When you don't write related_name.all() on prefetching, it does not work as expected. you can get the data like this:
prefetch_membership_set = models.Prefetch('membership_set',
Membership.objects.only(
'date_joined', 'invite_reason'))
groups = Group.objects.prefetch_related(prefetch_membership_set)
for group in groups:
invite_reason_list = []
date_joined_list = []
for membership in group.membership_set.all():
invite_reason_list.append(
membership.invite_reason
)
date_joined_list.append(
membership.date_joined
)

Related

multiple joins on django queryset

For the below sample schema
# schema sameple
class A(models.Model):
n = models.ForeignKey(N, on_delete=models.CASCADE)
d = models.ForeignKey(D, on_delete=models.PROTECT)
class N(models.Model):
id = models.AutoField(primary_key=True, editable=False)
d = models.ForeignKey(D, on_delete=models.PROTECT)
class D(models.Model):
dsid = models.CharField(max_length=255, primary_key=True)
class P(models.Model):
id = models.AutoField(primary_key=True, editable=False)
name = models.CharField(max_length=255)
n = models.ForeignKey(N, on_delete=models.CASCADE)
# raw query for the result I want
# SELECT P.name
# FROM P, N, A
# WHERE (P.n_id = N.id
# AND A.n_id = N.id
# AND A.d_id = \'MY_DSID\'
# AND P.name = \'MY_NAME\')
What am I trying to achieve?
Well, I’m trying to find a way somehow be able to write a single queryset which does the same as what the above raw query does. So far I was able to do it by writing two queryset, and use the result from one queryset and then using that queryset I wrote the second one, to get the final DB records. However that’s 2 hits to the DB, and I want to optimize it by just doing everything in one DB hit.
What will be the queryset for this kinda raw query ? or is there a better way to do it ?
Above code is here https://dpaste.org/DZg2
You can archive it using related_name attribute and functions like select_related and prefetch_related.
Assuming the related name for each model will be the model's name and _items, but it is better to have proper model names and then provided meaningful related names. Related name is how you access the model in backward.
This way, you can use this query to get all models in a single DB hit:
A.objects.all().select_related("n", "d", "n__d").prefetch_related("n__p_items")
I edited the code in the pasted site, however, it will expire soon.

Listing all related objects and allow paging on related objects

Can someone give me the best approach with an example for the following...
On a page I load the 'Group' object by ID. I also want to list all contacts that belong to that group (with paging).
Because of the paging issue I was thinking of just running a second database query with...
In my view...
group = get_object_or_404(Group, pk=id)
contacts = Contacts.objects.filter(group=x)
But this seems wasteful as I'm already getting the Group why hit the database twice.
See my model.
class GroupManager(models.Manager):
def for_user(self, user):
return self.get_query_set().filter(user=user,)
class Group(models.Model):
name = models.CharField(max_length=60)
modified = models.DateTimeField(null=True, auto_now=True,)
#FK
user = models.ForeignKey(User, related_name="user")
objects = GroupManager()
def get_absolute_url(self):
return reverse('contacts.views.group', args=[str(self.id)])
class Contact(models.Model):
first_name = models.CharField(max_length=60)
last_name = models.CharField(max_length=60)
#FK
group = models.ForeignKey(Group)
This is what select_related is designed for:
Returns a QuerySet that will automatically “follow” foreign-key
relationships, selecting that additional related-object data when it
executes its query. This is a performance booster which results in
(sometimes much) larger queries but means later use of foreign-key
relationships won’t require database queries.
In your case it would be:
Group.objects.select_related().get(pk=group)
Now on each FK lookup, you won't hit the database again.
The next step would be to cache the results using the cache api so that you don't hit the database everytime the next "page" is called. This would be useful if your data isn't time sensitive.

How to build SQL query with two left joins using django ORM

I need to build an MySQL query and I want to try with django ORM first and then use raw as last resort.
I found documentation on single JOIN or JOINs between two tables but there is no examples or at least a simple (beginner wise) explanation of JOINs between three tables
Content of models.py is
from django.db import models
# Create your models here.
class Threads(models.Model):
name = models.CharField(max_length=100)
author = models.CharField(max_length=100)
date = models.DateTimeField("date published")
slug = models.SlugField()
def __unicode__(self):
return self.name
class Posts(models.Model):
name = models.CharField(max_length=100)
text = models.TextField()
author = models.CharField(max_length=100)
date = models.DateTimeField("date published")
slug = models.SlugField()
def __unicode__(self):
return self.name
class Relations(models.Model):
thread = models.ForeignKey(Threads, related_name = "%(app_label)s_%(class)s_related")
post = models.ForeignKey(Posts, related_name = "%(app_label)s_%(class)s_related")
and this is SQL query in raw that I am trying to build
SELECT forum_threads.id AS t_id, forum_threads.name AS t_name, forum_threads.slug AS t_slug, forum_posts.*
FROM forum_threads
LEFT JOIN forum_relations ON forum_threads.id=forum_relations.thread_id
LEFT JOIN forum_posts ON forum_relations.post_id=forum_posts.id
WHERE forum_threads.slug="<slug_name>"
GROUP BY forum_threads.id
"forum" is my app name
Now I don't know if I need to tweak/change my Models and if, how. Note that I can change my models no important data whatsoever.
EDIT
Thank you for all your answers!
Ok I played a bit with various examples until i managed to produce someting. I got it like this:
thread = Threads.objects.filter(slug = slug)
posts = Posts.objects.filter(forum_relations_related__thread = thread[0].id)
first query is to retrieve id of thread from slug and second one returns all post related to thread on that thread id.
I'll try and play around with a M2M part since I have at least one working example.
Why not just use a M2M relation, you can use through if need be.
You could then get a thread by slug
thread = Threads.objects.get(slug=slug_name)
then you can access the posts related to a thread via
thread.posts_set.all()

Changed Django's primary key field, now items don't appear in the admin

I imported my (PHP) old site's database tables into Django. By default it created a bunch of primary key fields within the model (since most of them were called things like news_id instead of id).
I just renamed all the primary keys to id and removed the fields from the model. The problem then came specifically with my News model. New stuff that I add doesn't appear in the admin. When I remove the following line from my ModelAdmin, they show up:
list_display = ['headline_text', 'news_category', 'date_posted', 'is_sticky']
Specifically, it's the news_category field that causes problems. If I remove it from that list then I see my new objects. Now, when I edit those items directly (hacking the URL with the item ID) they have a valid category, likewise in the database. Here's the model definitions:
class NewsCategory(models.Model):
def __unicode__(self):
return self.cat_name
#news_category_id = models.IntegerField(primary_key=True, editable=False)
cat_name = models.CharField('Category name', max_length=75)
cat_link = models.SlugField('Category name URL slug', max_length=75, blank=True, help_text='Used in URLs, eg spb.com/news/this-is-the-url-slug/ - generated automatically by default')
class Meta:
db_table = u'news_categories'
ordering = ["cat_name"]
verbose_name_plural = "News categories"
class News(models.Model):
def __unicode__(self):
return self.headline_text
#news_id = models.IntegerField(primary_key=True, editable=False)
news_category = models.ForeignKey('NewsCategory')
writer = models.ForeignKey(Writer) # todo - automate
headline_text = models.CharField(max_length=75)
headline_link = models.SlugField('Headline URL slug', max_length=75, blank=True, help_text='Used in URLs, eg spb.com/news/this-is-the-url-slug/ - generated automatically by default')
body = models.TextField()
extra = models.TextField(blank=True)
date_posted = models.DateTimeField(auto_now_add=True)
is_sticky = models.BooleanField('Is this story featured on the homepage?', blank=True)
tags = TaggableManager(blank=True)
class Meta:
db_table = u'news'
verbose_name_plural = "News"
You can see where I've commented out the autogenerated primary key fields.
It seems like somehow Django thinks my new items don't have news_category_ids, but they definitely do. I tried editing an existing piece of news and changing the category and it worked as normal. If I run a search for one of the new items, it doesn't show up, but the bottom of the search says "1 News found", so something is going on.
Any tips gratefully received.
EDIT: here's my ModelAdmin too:
class NewsCategoryAdmin(admin.ModelAdmin):
prepopulated_fields = {"cat_link": ("cat_name",)}
list_display = ['cat_name', '_cat_count']
def _cat_count(self, obj):
return obj.news_set.count()
_cat_count.short_description = "Number of news stories"
class NewsImageInline(admin.TabularInline):
model = NewsImage
extra = 1
class NewsAdmin(admin.ModelAdmin):
prepopulated_fields = {"headline_link": ("headline_text",)}
list_display = ['headline_text', 'news_category', 'date_posted', 'is_sticky'] #breaking line
list_filter = ['news_category', 'date_posted', 'is_sticky']
search_fields = ['headline_text']
inlines = [NewsImageInline]
The answer you are looking for I think would lie in the SQL schema that you altered and not in the django models.
It could probably have something to do with null or blank values in the news_category_id, or news that belongs to a category that doesn't exist in the news_category. Things I'd check:
You have renamed the primary key on the News category from news_category_id to id. Does the foreign key on the News also map to news_category_id and not anything else?
Are all the values captured in the news.news_category also present in news_category.id
Also, as an aside, I don't see any reason why you need to rename the primary keys to id from something that they already are. Just marking them primary_key=True works just fine. Django provides you a convenient alias pk to access a model's integer primary key, irrespective of what the name of the field actually is.

Should I use a seperate table instead of many to many field in Django

I needed to assign one or more categories to a list of submissions, I initially used a table with two foreign keys to accomplish this until I realized Django has a many-to-many field, however following the documentation I haven't been able to duplicate what I did with original table.
My question is : Is there a benefit to using many-to-many field instead of manually creating a relationship table? If better, are there any example on submitting and retrieving many-to-many fields with Django?
From the Django docs on Many-to-Many relationships:
When you're only dealing with simple many-to-many relationships such
as mixing and matching pizzas and toppings, a standard ManyToManyField
is all you need. However, sometimes you may need to associate data
with the relationship between two models.
In short: If you have a simple relationship a Many-To_Many field is better (creates and manages the extra table for you). If you need multiple extra details then create your own model with foreign keys. So it really depends on the situation.
Update :- Examples as requested:
From the docs:
class Person(models.Model):
name = models.CharField(max_length=128)
def __unicode__(self):
return self.name
class Group(models.Model):
name = models.CharField(max_length=128)
members = models.ManyToManyField(Person, through='Membership')
def __unicode__(self):
return self.name
class Membership(models.Model):
person = models.ForeignKey(Person)
group = models.ForeignKey(Group)
date_joined = models.DateField()
invite_reason = models.CharField(max_length=64)
You can see through this example that membership details (date_joined and invite_reason) are kept in addition to the many-to-many relationship.
However on a simplified example from the docs:
class Topping(models.Model):
ingredient = models.CharField(max_length=128)
class Pizza(models.Model):
name = models.CharField(max_length=128)
toppings = models.ManyToManyField(Topping)
There seems no need for any extra data and hence no extra model.
Update 2 :-
An example of how to remove the relationship.
In the first example i gave you have this extra model Membership you just delete the relationship and its details like a normal model.
for membership in Membership.objects.filter(person__pk=1)
membership.delete()
Viola! easy as pie.
For the second example you need to use .remove() (or .clear() to remove all):
apple = Toppings.objects.get(pk=4)
super_pizza = Pizza.objects.get(pk=12)
super_pizza.toppings.remove(apple)
super_pizza.save()
And that one is done too!