Django - Counting ManyToMany Relationships - django

In my model, I have many Things that can have many Labels, and this relationship is made by user-submitted Descriptions via form. I cannot figure out how to count how much of each Label each Thing has.
In models.py, I have:
class Label(models.Model):
name = models.CharField(max_length=100)
class Thing(models.Model):
name = models.CharField(max_length=100)
class Description(models.Model):
thingname = models.ForeignKey(Thing, on_delete=models.CASCADE)
labels = models.ManyToManyField(Label,blank=True)
If we say our current Thing is a cat, and ten people have submitted a Description for the cat, how can we make our template output an aggregate count of each related Label for the Thing?
For example:
Cat
10 fluffy
6 fuzzy
4 cute
2 dangerous
1 loud
I've tried a few things with filters and annotations like
counts = Label.objects.filter(description_form = pk).annotate(num_notes=Count('name'))
but I think there's something obvious I'm missing either in my views.py or in my template.

You can use this to retrive this information:
Description.objects.prefetch_related("labels").values("labels__name", "thing_name__name").annotate(num_notes=models.Count("labels__name"))
this will be equal to:
SELECT "core_label"."name",
"core_thing"."name",
Count("core_label"."name") AS "num_notes"
FROM "core_description"
LEFT OUTER JOIN "core_description_labels"
ON ( "core_description"."id" =
"core_description_labels"."description_id" )
LEFT OUTER JOIN "core_label"
ON ( "core_description_labels"."label_id" =
"core_label"."id" )
INNER JOIN "core_thing"
ON ( "core_description"."thing_name_id" = "core_thing"."id" )
GROUP BY "core_label"."name",
"core_thing"."name"

Related

Django Query - Get list that isnt in FK of another model

I am working on a django web app that manages payroll based on reports completed, and then payroll generated. 3 models as follows. (ive tried to limit to data needed for question).
class PayRecord(models.Model):
rate = models.FloatField()
user = models.ForeignKey(User)
class Payroll(models.Model):
company = models.ForeignKey(Company)
name = models.CharField()
class PayrollItem(models.Model):
payroll = models.ForeignKey(Payroll)
record = models.OneToOneField(PayRecord, unique=True)
What is the most efficient way to get all the PayRecords that aren't also in PayrollItem. So i can select them to create a payroll item.
There are 100k records, and my initial attempt takes minutes. Attempt tried below (this is far from feasible).
records_completed_in_payrolls = [
p.report.id for p in PayrollItem.objects.select_related(
'record',
'payroll'
)
]
Because you have the related field record in PayrollItem you can reach into that model while you filter PayRecord. Using the __isnull should give you what you want.
PayRecord.objects.filter(payrollitem__isnull=True)
Translates to a sql statement like:
SELECT payroll_payrecord.id,
payroll_payrecord.rate,
payroll_payrecord.user_id
FROM payroll_payrecord
LEFT OUTER JOIN payroll_payrollitem
ON payroll_payrecord.id = payroll_payrollitem.record_id
WHERE payroll_payrollitem.id IS NULL
Depending on your intentions, you may want to chain on a .select_related (https://docs.djangoproject.com/en/3.1/ref/models/querysets/#select-related)
PayRecord.objects.filter(payrollitem__isnull=True).select_related('user')
which translates to something like:
SELECT payroll_payrecord.id,
payroll_payrecord.rate,
payroll_payrecord.user_id,
payroll_user.id,
payroll_user.name
FROM payroll_payrecord
LEFT OUTER JOIN payroll_payrollitem
ON (payroll_payrecord.id = payroll_payrollitem.record_id)
INNER JOIN payroll_user
ON (payroll_payrecord.user_id = payroll_user.id)
WHERE payroll_payrollitem.id IS NULL

Are queries using related_name more performant in Django?

Lets say I have the following models set up:
class Shop(models.Model):
...
class Product(models.Model):
shop = models.ForeignKey(Shop, related_name='products')
Now lets say we want to query all the products from the shop with label 'demo' whose prices are below $100. There are two ways to do this:
shop = Shop.objects.get(label='demo')
products = shop.products.filter(price__lte=100)
Or
shop = Shop.objects.get(label='demo')
products = Products.objects.filter(shop=shop, price__lte=100)
Is there a difference between these two queries? The first one is using the related_name property. I know foreign keys are indexed, so searching using them should be faster, but is this applicable in our first situation?
Short answer: this will result in equivalent queries.
We can do the test by printing the queries:
>>> print(shop.products.filter(price__lte=100).query)
SELECT "app_product"."id", "app_product"."shop_id", "app_product"."price" FROM "app_product" WHERE ("app_product"."shop_id" = 1 AND "app_product"."price" <= 100)
>>> print(Product.objects.filter(shop=shop, price__lte=100).query)
SELECT "app_product"."id", "app_product"."shop_id", "app_product"."price" FROM "app_product" WHERE ("app_product"."price" <= 100 AND "app_product"."shop_id" = 1)
except that the conditions in the WHERE are swapped, the two are equal. But usually this does not make any difference at the database side.
If you however are not interested in the Shop object itself, you can filter with:
products = Product.objects.filter(shop__label='demo', price__lte=100)
This will make a JOIN at the database level, and will thus retrieve the data in a single pass:
SELECT "app_product"."id", "app_product"."shop_id", "app_product"."price"
FROM "app_product"
INNER JOIN "app_shop" ON "app_product"."shop_id" = "app_shop"."id"
WHERE "app_product"."price" <= 100 AND "app_shop"."label" = demo

How to left outer join with extra condition in Django

I have these three models:
class Track(models.Model):
title = models.TextField()
artist = models.TextField()
class Tag(models.Model):
name = models.CharField(max_length=50)
class TrackHasTag(models.Model):
track = models.ForeignKey('Track', on_delete=models.CASCADE)
tag = models.ForeignKey('Tag', on_delete=models.PROTECT)
And I want to retrieve all Tracks that are not tagged with a specific tag. This gets me what I want: Track.objects.exclude(trackhastag__tag_id='1').only('id') but it's very slow when the tables grow. This is what I get when printing .query of the queryset:
SELECT "track"."id"
FROM "track"
WHERE NOT ( "track"."id" IN (SELECT U1."track_id" AS Col1
FROM "trackhastag" U1
WHERE U1."tag_id" = 1) )
I would like Django to send this query instead:
SELECT "track"."id"
FROM "track"
LEFT OUTER JOIN "trackhastag"
ON "track"."id" = "trackhastag"."track_id"
AND "trackhastag"."tag_id" = 1
WHERE "trackhastag"."id" IS NULL;
But haven't found a way to do so. Using a Raw Query is not really an option as I have to filter the resulting queryset very often.
The cleanest workaround I have found is to create a view in the database and a model TrackHasTagFoo with managed = False that I use to query like: Track.objects.filter(trackhastagfoo__isnull=True). I don't think this is an elegant nor sustainable solution as it involves adding Raw SQL to my migrations to mantain said view.
This is just one example of a situation where we need to do this kind of left join with an extra condition, but the truth is that we are facing this problem in more parts of our application.
Thanks a lot!
As mentioned in Django #29555 you can use FilteredRelation for this purpose since Django 2.0.
Track.objects.annotate(
has_tag=FilteredRelation(
'trackhastag', condition=Q(trackhastag__tag=1)
),
).filter(
has_tag__isnull=True,
)
What about queryset extras? They do not break ORM and can be further filtered (vs RawSQL)
from django.db.models import Q
Track.objects.filter(
# work around to force left outer join
Q(trackhastag__isnull=True) | Q(trackhastag__isnull=False)
).extra(
# where parameters are “AND”ed to any other search criteria
# thus we need to account for NULL
where=[
'"app_trackhastag"."id" <> %s or "app_trackhastag"."id" is NULL'
],
params=[1],
)
produces this somewhat convoluted query:
SELECT "app_track"."id", "app_track"."title", "app_track"."artist"
FROM "app_track"
LEFT OUTER JOIN "app_trackhastag"
ON ("app_track"."id" = "app_trackhastag"."track_id")
WHERE (
("app_trackhastag"."id" IS NULL OR "app_trackhastag"."id" IS NOT NULL) AND
("app_trackhastag"."id" <> 1 or "app_trackhastag"."id" is NULL)
)
Rationale
Step 1
One straight forward way to have a left outer join with queryset is the following:
Track.objects.filter(trackhastag__isnull=True)
which gives:
SELECT "app_track"."id", "app_track"."title", "app_track"."artist"
FROM "app_track"
LEFT OUTER JOIN "app_trackhastag"
ON ("app_track"."id" = "app_trackhastag"."track_id")
WHERE "app_trackhastag"."id" IS NULL
Step 2
Realize that once step 1 is done (we have a left outer join), we can leverage
queryset's extra:
Track.objects.filter(
trackhastag__isnull=True
).extra(
where=['"app_trackhastag"."id" <> %s'],
params=[1],
)
which gives:
SELECT "app_track"."id", "app_track"."title", "app_track"."artist"
FROM "app_track"
LEFT OUTER JOIN "app_trackhastag"
ON ("app_track"."id" = "app_trackhastag"."track_id")
WHERE (
"app_trackhastag"."id" IS NULL AND
("app_trackhastag"."id" <> 1)
)
Step 3
Playing around extra limitations (All where parameters are “AND”ed to any other search criteria) to come up with final solution above.
Using filters is better than exclude... because wit exclude they will get the entire query first and only than excluding the itens you dont want, while filter get only what you want Like you said Track.objects.filter(trackhastagfoo__isnull=True) is better than Exclude one.
Suggestion: You trying to manually do one ManyToMany Relations, as Mohammad said, why you dont try use ManyToManyField? is more easy to use
Maybe this answer your question: Django Left Outer Join
Enric, why you did not use many to many relation
class Track(models.Model):
title = models.TextField()
artist = models.TextField()
tags = models.ManyToManyField(Tag)
class Tag(models.Model):
name = models.CharField(max_length=50)
And for your question
Track.objects.filter(~Q(tags__id=1))

multiple Django annotate Count over reverse relation of a foreign key with an exclude returns a strange result (18)

The strangest thing, either I'm missing something basic, or maybe a django bug
for example:
class Author(Model):
name = CharField()
class Parent(Model):
name = CharField(
class Subscription(Model):
parent = ForeignKey(Parent, related_name='subscriptions')
class Book(Model):
name = CharField()
good_book = BooleanField()
author = ForeignKey(Author, related_name='books')
class AggregatePerson(Model):
author = OneToOneField(Author, related_name='+')
parent = OneToOneField(Parent, related_name='+')
when I try:
AggregatePerson.objects.annotate(counter=Count('author__books')).order_by('counter')
everything work correctly. both ordering and fields counter and existing_subs show the correct number BUT if I add the following:
AggregatePerson.objects.annotate(existing_subs=Count('parent__subscriptions')).exclude(existing_subs=0).annotate(counter=Count('author__books')).order_by('counter')
Then counter and existing_subs fields become 18
Why 18? and what am I doing wrong?
Thanks for the help!
EDIT clarification after further research:
is the number of parent__subscriptions, the code breaks even without the exclude, **for some reason counter also gets the value of existing_subs
I found the answer to this issue.
Tl;dr:
You need to add distinct=True inside the Count like this:
AggregatePerson.objects.annotate(counter=Count('author__books', distinct=True))
Longer version:
Adding a Count annotation is adding a LEFT OUTER JOIN behind the scene. Since we add two annotations, both referring to the same table, the number of selected and grouped_by rows is increased since some rows may appear twice (once for the first annotation and another for the second annotation) because LEFT OUTER JOIN allows empty cells (rows) on select from the right table.
(repeating essentials of my reply in another forum)
This looks like a Django bug. Possible workarounds:
1) Add the two annotations in one annotate() call:
...annotate(existing_subs=Count('parent__subscriptions'),counter=Count('author__books'))...
2) Replace the annotation for existing_subs and exclude(existing_subs=0) with an exclude (parent__subscriptions=None).

Django ORM - select_related and order_by with foreign keys

I have a simple music schema: Artist, Release, Track, and Song. The first 3 are all logical constructs while the fourth (Song) is a specific instance of an (Artist, Release, Track) as an mp3, wav, ogg, whatever.
I am having trouble generating an ordered list of the Songs in the database. The catch is that both Track and Release have an Artist. While Song.Track.Artist is always the performer name, Song.Track.Release.Artist may either be a performer name or "Various Artists" for compilations. I want to be able to sort by one or the other, and I can't figure out the correct way to make this work.
Here's my schema:
class Artist(models.Model):
name = models.CharField(max_length=512)
class Release(models.Model):
name = models.CharField(max_length=512)
artist = models.ForeignKey(Artist)
class Track(models.Model):
name = models.CharField(max_length=512)
track_number = models.IntegerField('Position of the track on its release')
length = models.IntegerField('Length of the song in seconds')
artist = models.ForeignKey(Artist)
release = models.ForeignKey(Release)
class Song(models.Model):
bitrate = models.IntegerField('Bitrate of the song in kbps')
location = models.CharField('Permanent storage location of the file', max_length=1024)
owner = models.ForeignKey(User)
track = models.ForeignKey(Track)
My query should be fairly simple; filter for all songs owned by a specific user, and then sort them by either Song.Track.Artist.name or Song.Track.Release.Artist.name. Here's my code inside a view, which is sorting by Song.Track.Artist.name:
songs = Song.objects.filter(owner=request.user).select_related('track__artist', 'track__release', 'track__release__artist').order_by('player_artist.name')
I can't get order_by to work unless I use tblname.colname. I took a look at the underlying query object's as_sql method, which indicates that when the inner join is made to get Song.Track.Release.Artist the temporary name T6 is used for the Artist table since an inner join was already done on this same table to get Song.Track.Artist:
>>> songs = Song.objects.filter(owner=request.user).select_related('track__artist', 'track__release', 'track__release__artist').order_by('T6.name')
>>> print songs.query.as_sql()
('SELECT "player_song"."id", "player_song"."bitrate", "player_song"."location",
"player_song"."owner_id", "player_song"."track_id", "player_track"."id",
"player_track"."name", "player_track"."track_number", "player_track"."length",
"player_track"."artist_id", "player_track"."release_id", "player_artist"."id",
"player_artist"."name", "player_release"."id", "player_release"."name",
"player_release"."artist_id", T6."id", T6."name" FROM "player_song" INNER JOIN
"player_track" ON ("player_song"."track_id" = "player_track"."id") INNER JOIN
"player_artist" ON ("player_track"."artist_id" = "player_artist"."id") INNER JOIN
"player_release" ON ("player_track"."release_id" = "player_release"."id") INNER JOIN
"player_artist" T6 ON ("player_release"."artist_id" = T6."id") WHERE
"player_song"."owner_id" = %s ORDER BY T6.name ASC', (1,))
When I put this as the table name in order_by it does work (see example output above), but this seems entirely non-portable. Surely there's a better way to do this! What am I missing?
I'm afraid I really can't understand what your question is.
A couple of corrections: select_related has nothing to do with ordering (it doesn't change the queryset at all, just follows joins to get related objects and cache them); and to order by a field in a related model you use the double-underscore notation, not dotted. For example:
Song.objects.filter(owner=request.user).order_by('track__artist__name')
But in your example, you use 'player_artist', which doesn't seem to be a field anywhere in your model. And I don't understand your reference to portability.