Django how to annotate manytomany field with count - django

Considger the following:
class ModelA(models.Model):
pass
class ModelB(models.Model):
model_as = models.ManyToManyField(ModelA, db_index=True, related_name="model_bs", blank=True)
model_bs = model_a_object.model_bs.all()
for model_b in model_bs:
print(model_b.model_as.count())
3
2
So far so good. But I want to create an ordered list of model_bs depending on the count of model_as. My understanding is that simply this should do the trick:
model_bs = model_a_object.model_bs.all().annotate(count=Count("model_as")).order_by('count')
For some reason that doesn't work. When I print to check, the annotated count is wrong!!
for model_b in model_bs:
print(model_b.model_as.count(), model_b.count)
3 1
2 1
What did I do wrong? Here is the sql output for print(model_bs.query):
SELECT "mobile_modelb"."id",
"mobile_modelb"."modelb",
COUNT(DISTINCT "mobile_modelb_model_as"."modela_id") AS "count"
FROM "mobile_modelb"
INNER JOIN "mobile_modelb_model_as" ON ("mobile_modelb"."id" = "mobile_modelb_model_as"."modelb_id")
WHERE "mobile_modelb_model_as"."modela_id" = 7
GROUP BY "mobile_modelb"."id"
ORDER BY "count" ASC

Try adding distinct=True to the Count.
model_bs = model_a_object.model_bs.all().annotate(
count=Count("model_as", distinct=True)
).order_by('count')
EDIT:
The problem is because you're using model_a_object.model_bs.all(), you generate this WHERE clause:
WHERE "mobile_modelb_model_as"."modela_id" = 7
The solution may be to do:
ModelB.objects.annotate(
count=Count("model_as")
).filter(model_as=model_object_a)
If that doesn't work, try using values() before filter() to define the GROUP BY correctly.
ModelB.objects.annotate(
count=Count("model_as")
).values('id', 'count').filter(model_as=model_object_a)

Related

Django - Counting ManyToMany Relationships

In my model, I have many Things that can have many Labels, and this relationship is made by user-submitted Descriptions via form. I cannot figure out how to count how much of each Label each Thing has.
In models.py, I have:
class Label(models.Model):
name = models.CharField(max_length=100)
class Thing(models.Model):
name = models.CharField(max_length=100)
class Description(models.Model):
thingname = models.ForeignKey(Thing, on_delete=models.CASCADE)
labels = models.ManyToManyField(Label,blank=True)
If we say our current Thing is a cat, and ten people have submitted a Description for the cat, how can we make our template output an aggregate count of each related Label for the Thing?
For example:
Cat
10 fluffy
6 fuzzy
4 cute
2 dangerous
1 loud
I've tried a few things with filters and annotations like
counts = Label.objects.filter(description_form = pk).annotate(num_notes=Count('name'))
but I think there's something obvious I'm missing either in my views.py or in my template.
You can use this to retrive this information:
Description.objects.prefetch_related("labels").values("labels__name", "thing_name__name").annotate(num_notes=models.Count("labels__name"))
this will be equal to:
SELECT "core_label"."name",
"core_thing"."name",
Count("core_label"."name") AS "num_notes"
FROM "core_description"
LEFT OUTER JOIN "core_description_labels"
ON ( "core_description"."id" =
"core_description_labels"."description_id" )
LEFT OUTER JOIN "core_label"
ON ( "core_description_labels"."label_id" =
"core_label"."id" )
INNER JOIN "core_thing"
ON ( "core_description"."thing_name_id" = "core_thing"."id" )
GROUP BY "core_label"."name",
"core_thing"."name"

Queryset for ManyToMany relation ordered by number of match from a list

I have two models, Entity and Tag with a ManyToMany relation :
class Tag(models.Model):
name = models.CharField(max_length=128)
class Entity(models.Model):
name = models.CharField(max_length=128)
tags = models.ManyToManyField(Tag)
In a view I have a list of Tag id and would like to get a top 10 of Entity with at least one Tag ordered by the number of Tags they have in common with my list.
I am currently doing the "at least one" part through 'tags__in' and adding a '.distinct()' at the end of my queryset. Is it the proper way ?
I can see that there is a 'Count' in annotate that can have some arguments, is there a way to specify only if id in my list ?
If not do I need to go through an SQL query like this (from the documentation) ? I fear I am going to lose quite some perf.
Blog.objects.extra(
select={
'entry_count': 'SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id'
},
)
If I understood your problem correctly, I think this query should do what you want (I'm using django 3.0):
from django.db.models import Count, Q
Entity.objects.annotate(tag_count=Count("tags", filter=Q(tags__id__in=tag_ids))).filter(
tag_count__gt=0
).order_by("-tag_count")[:10]
generated SQL (I'm using postgres):
SELECT "people_entity"."id",
"people_entity"."name",
COUNT("people_entity_tags"."tag_id") FILTER (WHERE "people_entity_tags"."tag_id" IN (1, 4, 5, 8)) AS "tag_count"
FROM "people_entity"
LEFT OUTER JOIN "people_entity_tags"
ON ("people_entity"."id" = "people_entity_tags"."entity_id")
GROUP BY "people_entity"."id"
HAVING COUNT("people_entity_tags"."tag_id") FILTER (WHERE ("people_entity_tags"."tag_id" IN (1, 4, 5, 8))) > 0
ORDER BY "tag_count" DESC
LIMIT 10
small edit: added Q in imports

How to left outer join with extra condition in Django

I have these three models:
class Track(models.Model):
title = models.TextField()
artist = models.TextField()
class Tag(models.Model):
name = models.CharField(max_length=50)
class TrackHasTag(models.Model):
track = models.ForeignKey('Track', on_delete=models.CASCADE)
tag = models.ForeignKey('Tag', on_delete=models.PROTECT)
And I want to retrieve all Tracks that are not tagged with a specific tag. This gets me what I want: Track.objects.exclude(trackhastag__tag_id='1').only('id') but it's very slow when the tables grow. This is what I get when printing .query of the queryset:
SELECT "track"."id"
FROM "track"
WHERE NOT ( "track"."id" IN (SELECT U1."track_id" AS Col1
FROM "trackhastag" U1
WHERE U1."tag_id" = 1) )
I would like Django to send this query instead:
SELECT "track"."id"
FROM "track"
LEFT OUTER JOIN "trackhastag"
ON "track"."id" = "trackhastag"."track_id"
AND "trackhastag"."tag_id" = 1
WHERE "trackhastag"."id" IS NULL;
But haven't found a way to do so. Using a Raw Query is not really an option as I have to filter the resulting queryset very often.
The cleanest workaround I have found is to create a view in the database and a model TrackHasTagFoo with managed = False that I use to query like: Track.objects.filter(trackhastagfoo__isnull=True). I don't think this is an elegant nor sustainable solution as it involves adding Raw SQL to my migrations to mantain said view.
This is just one example of a situation where we need to do this kind of left join with an extra condition, but the truth is that we are facing this problem in more parts of our application.
Thanks a lot!
As mentioned in Django #29555 you can use FilteredRelation for this purpose since Django 2.0.
Track.objects.annotate(
has_tag=FilteredRelation(
'trackhastag', condition=Q(trackhastag__tag=1)
),
).filter(
has_tag__isnull=True,
)
What about queryset extras? They do not break ORM and can be further filtered (vs RawSQL)
from django.db.models import Q
Track.objects.filter(
# work around to force left outer join
Q(trackhastag__isnull=True) | Q(trackhastag__isnull=False)
).extra(
# where parameters are “AND”ed to any other search criteria
# thus we need to account for NULL
where=[
'"app_trackhastag"."id" <> %s or "app_trackhastag"."id" is NULL'
],
params=[1],
)
produces this somewhat convoluted query:
SELECT "app_track"."id", "app_track"."title", "app_track"."artist"
FROM "app_track"
LEFT OUTER JOIN "app_trackhastag"
ON ("app_track"."id" = "app_trackhastag"."track_id")
WHERE (
("app_trackhastag"."id" IS NULL OR "app_trackhastag"."id" IS NOT NULL) AND
("app_trackhastag"."id" <> 1 or "app_trackhastag"."id" is NULL)
)
Rationale
Step 1
One straight forward way to have a left outer join with queryset is the following:
Track.objects.filter(trackhastag__isnull=True)
which gives:
SELECT "app_track"."id", "app_track"."title", "app_track"."artist"
FROM "app_track"
LEFT OUTER JOIN "app_trackhastag"
ON ("app_track"."id" = "app_trackhastag"."track_id")
WHERE "app_trackhastag"."id" IS NULL
Step 2
Realize that once step 1 is done (we have a left outer join), we can leverage
queryset's extra:
Track.objects.filter(
trackhastag__isnull=True
).extra(
where=['"app_trackhastag"."id" <> %s'],
params=[1],
)
which gives:
SELECT "app_track"."id", "app_track"."title", "app_track"."artist"
FROM "app_track"
LEFT OUTER JOIN "app_trackhastag"
ON ("app_track"."id" = "app_trackhastag"."track_id")
WHERE (
"app_trackhastag"."id" IS NULL AND
("app_trackhastag"."id" <> 1)
)
Step 3
Playing around extra limitations (All where parameters are “AND”ed to any other search criteria) to come up with final solution above.
Using filters is better than exclude... because wit exclude they will get the entire query first and only than excluding the itens you dont want, while filter get only what you want Like you said Track.objects.filter(trackhastagfoo__isnull=True) is better than Exclude one.
Suggestion: You trying to manually do one ManyToMany Relations, as Mohammad said, why you dont try use ManyToManyField? is more easy to use
Maybe this answer your question: Django Left Outer Join
Enric, why you did not use many to many relation
class Track(models.Model):
title = models.TextField()
artist = models.TextField()
tags = models.ManyToManyField(Tag)
class Tag(models.Model):
name = models.CharField(max_length=50)
And for your question
Track.objects.filter(~Q(tags__id=1))

django annotate question

I have the following model:
class Pick(models.Model):
league = models.ForeignKey(League)
user = models.ForeignKey(User)
team = models.ForeignKey(Team)
week = models.IntegerField()
result = models.IntegerField(default=3, help_text='loss=0, win=1, tie=2, not started=3, in progress=4')
I'm trying to get generate a standings table based off of the results, but I'm unsure how to get it done in a single query. I'm interested in getting, for each user in a particular league, a count of the results that = 1 (as win), 0 (as loss) and 2 as tie). The only thing I can think of is to do 3 separate queries where I filter the results and then annotate like so:
Pick.objects.filter(league=2, result=1).annotate(wins=Count('result'))
Pick.objects.filter(league=2, result=0).annotate(losses=Count('result'))
Pick.objects.filter(league=2, result=2).annotate(ties=Count('result'))
Is there a more efficient way to achieve this?
Thanks!
The trick to this is to use the values method to just select the fields you want to aggregate on.
Pick.objects.filter(league=2).values('result').aggregate(wins=Count('result'))

chain filter and exclude on django model with field lookups that span relationships

I have the following models:
class Order_type(models.Model):
description = models.CharField()
class Order(models.Model):
type= models.ForeignKey(Order_type)
order_date = models.DateField(default=datetime.date.today)
status = models.CharField()
processed_time= models.TimeField()
I want a list of the order types that have orders that meet this criteria: (order_date <= today AND processed_time is empty AND status is not blank)
I tried:
qs = Order_type.objects.filter(order__order_date__lte=datetime.date.today(),\
order__processed_time__isnull=True).exclude(order__status='')
This works for the original list of orders:
orders_qs = Order.objects.filter(order_date__lte=datetime.date.today(), processed_time__isnull=True)
orders_qs = orders_qs.exclude(status='')
But qs isn't the right queryset. I think its actually returning a more narrowed filter (since no records are present) but I'm not sure what. According to this (django reference), because I'm referencing a related model I think the exclude works on the original queryset (not the one from the filter), but I don't get exactly how.
OK, I just thought of this, which I think works, but feels sloppy (Is there a better way?):
qs = Order_type.objects.filter(order__id__in=[o.id for o in orders_qs])
What's happening is that the exclude() query is messing things up for you. Basically, it's excluding any Order_type that has at least one Order without a status, which is almost certainly not what you want to happen.
The simplest solution in your case is to use order__status__gt='' in you filter() arguments. However, you will also need to append distinct() to the end of your query, because otherwise you'd get a QuerySet with multiple instances of the same Order_type if it has more than one Order that matches the query. This should work:
qs = Order_type.objects.filter(
order__order_date__lte=datetime.date.today(),
order__processed_time__isnull=True,
order__status__gt='').distinct()
On a side note, in the qs query you gave at the end of the question, you don't have to say order__id__in=[o.id for o in orders_qs], you can simply use order__in=orders_qs (you still also need the distinct()). So this will also work:
qs = Order_type.objects.filter(order__in=Order.objects.filter(
order_date__lte=datetime.date.today(),
processed_time__isnull=True).exclude(status='')).distinct()
Addendum (edit):
Here's the actual SQL that Django issues for the above querysets:
SELECT DISTINCT "testapp_order_type"."id", "testapp_order_type"."description"
FROM "testapp_order_type"
LEFT OUTER JOIN "testapp_order"
ON ("testapp_order_type"."id" = "testapp_order"."type_id")
WHERE ("testapp_order"."order_date" <= E'2010-07-18'
AND "testapp_order"."processed_time" IS NULL
AND "testapp_order"."status" > E'' );
SELECT DISTINCT "testapp_order_type"."id", "testapp_order_type"."description"
FROM "testapp_order_type"
INNER JOIN "testapp_order"
ON ("testapp_order_type"."id" = "testapp_order"."type_id")
WHERE "testapp_order"."id" IN
(SELECT U0."id" FROM "testapp_order" U0
WHERE (U0."order_date" <= E'2010-07-18'
AND U0."processed_time" IS NULL
AND NOT (U0."status" = E'' )));
EXPLAIN reveals that the second query is ever so slightly more expensive (cost of 28.99 versus 28.64 with a very small dataset).