Count of queryset where foreign key occurs exactly n times

Count of queryset where foreign key occurs exactly n times - django

If I have a django model with a foreign key, e.g:
class Article(models.Model):
headline = models.CharField(max_length=100)
pub_date = models.DateField()
reporter = models.ForeignKey(Reporter, on_delete=models.CASCADE)
is there a way for me to get a count of the number of reporters that have exactly n articles on a specific date? For example, how many reporters have published exactly 2 articles "today"
date = timzone.now().date()
articles_on_date = Article.objects.filter(pub_date=date)
# now what can I do?
Edit:
Currently I can only figure out how to do it very inneficiently by looping and hitting the database way to many times.

Using conditional expressions:
from django.db import models
Reporter.objects.annotate(
num_of_articles=models.Count(
models.Case(models.When(article__pub_date=date, then=1), output_field=models.IntegerField())
)
).filter(num_of_articles=2).count()

Try this,
from django.db.models import Count
Article.objects.filter(pub_date=date).values('reporter').annotate(article_count=Count('id')).filter(article_count=2)
This would return a list as below,
[{'reporter': 1, 'article_count': 2}]
The 1 corresponds to reporter is the id of the reporter instance

Related

How to apply an arbitrary filter on a specific chained prefetch_related() within Django?

I'm trying to optimize the fired queries of an API. I have four models namely User, Content, Rating, and UserRating with some relations to each other. I want the respective API returns all of the existing contents alongside their rating count as well as the score given by a specific user to that.
I used to do something like this: Content.objects.all() as a queryset, but I realized that in the case of having a huge amount of data tons of queries will be fired. So I've done some efforts to optimize the fired queries using select_related() and prefetch_related(). However, I'm dealing with an extra python searching, that I hope to remove that, using a controlled prefetch_related() — applying a filter just for a specific prefetch in a nested prefetch and select.
Here are my models:
from django.db import models
from django.conf import settings
class Content(models.Model):
title = models.CharField(max_length=50)
class Rating(models.Model):
count = models.PositiveBigIntegerField(default=0)
content = models.OneToOneField(Content, on_delete=models.CASCADE)
class UserRating(models.Model):
user = models.ForeignKey(
settings.AUTH_USER_MODEL, blank=True, null=True, on_delete=models.CASCADE
)
score = models.PositiveSmallIntegerField()
rating = models.ForeignKey(
Rating, related_name="user_ratings", on_delete=models.CASCADE
)
class Meta:
unique_together = ["user", "rating"]
Here's what I've done so far:
contents = (
Content.objects.select_related("rating")
.prefetch_related("rating__user_ratings")
.prefetch_related("rating__user_ratings__user")
)
for c in contents: # serializer like
user_rating = c.rating.user_ratings.all()
for u in user_rating: # how to remove this dummy search?
if u.user_id == 1:
print(u.score)
Queries:
(1) SELECT "bitpin_content"."id", "bitpin_content"."title", "bitpin_rating"."id", "bitpin_rating"."count", "bitpin_rating"."content_id" FROM "bitpin_content" LEFT OUTER JOIN "bitpin_rating" ON ("bitpin_content"."id" = "bitpin_rating"."content_id"); args=(); alias=default
(2) SELECT "bitpin_userrating"."id", "bitpin_userrating"."user_id", "bitpin_userrating"."score", "bitpin_userrating"."rating_id" FROM "bitpin_userrating" WHERE "bitpin_userrating"."rating_id" IN (1, 2); args=(1, 2); alias=default
(3) SELECT "users_user"."id", "users_user"."password", "users_user"."last_login", "users_user"."is_superuser", "users_user"."first_name", "users_user"."last_name", "users_user"."email", "users_user"."is_staff", "users_user"."is_active", "users_user"."date_joined", "users_user"."user_name" FROM "users_user" WHERE "users_user"."id" IN (1, 4); args=(1, 4); alias=default
As you can see on the above fired queries I've only three queries rather than too many queries which were happening in the past. However, I guess I can remove the python searching (the second for loop) using a filter on my latest query — users_user"."id" IN (1,) instead. According to this post and my efforts, I couldn't apply a .filter(rating__user_ratings__user_id=1) on the third query. Actually, I couldn't match my problem using Prefetch(..., queryset=...) instance given in this answer.

I think you are looking for Prefetch object:
https://docs.djangoproject.com/en/4.0/ref/models/querysets/#prefetch-objects
Try this:
from django.db.models import Prefetch
contents = Content.objects.select_related("rating").prefetch_related(
Prefetch(
"rating__user_ratings",
queryset=UserRating.objects.filter(user__id=1),
to_attr="user_rating_number_1",
)
)
for c in contents: # serializer like
print(c.rating.user_rating_number_1[0].score)

Django: cannot annotate using prefetch calculated attribute

Target is to sum and annotate workingtimes for each employee on a given time range.
models:
class Employee(models.Model):
first_name = models.CharField(max_length=64)
class WorkTime(models.Model):
employee = models.ForeignKey(Employee, on_delete=models.CASCADE, related_name="work_times")
work_start = models.DateTimeField()
work_end = models.DateTimeField()
work_delta = models.IntegerField(default=0)
def save(self, *args, **kwargs):
self.work_delta = (self.work_end - self.work_start).seconds
super().save(*args, **kwargs)
getting work times for each employee at a given date range:
queryset = Employee.objects.prefetch_related(
Prefetch(
'work_times',
queryset=WorkTime.objects.filter(work_start__date__range=("2021-03-01", "2021-03-15"]))
.order_by("work_start"),
to_attr="filtered_work_times"
)).all()
trying to annotate sum of work_delta to each employee:
queryset.annotate(work_sum=Sum("filtered_work_times__work_delta"))
This causes a FieldError:
Cannot resolve keyword 'filtered_work_times' into field. Choices are: first_name, id, work_times
How would one proceed from here? Using Django 3.1 btw.

You should use filtering on annotations.
I haven't tried, but I think the following code might help you:
from django.db.models import Sum, Q
Employee.objects.annotate(
work_sum=Sum(
'work_times__work_delta',
filter=Q(work_times__work_start__date__range=["2021-03-01", "2021-03-15"])
)
)

You cannot use the prefetch_related values in the query because simply the prefetching is done separately, Django would first fetch the current objects and then make queries to fetch the related objects so the field you try to refer is not even part of the query you want to add it to.
Instead of doing this simply add a filter [Django docs] keyword argument to your aggregation function:
from django.db.models import Q
start_date = datetime.date(2021, 3, 1)
end_date = datetime.date(2021, 3, 15)
result = queryset.annotate(work_sum=Sum("work_times__work_delta", filter=Q(work_times__work_start__date__range=(start_date, end_date))))

How can I filter a Django queryset by the latest of a related model?

Imagine I have the following 2 models in a contrived example:
class User(models.Model):
name = models.CharField()
class Login(models.Model):
user = models.ForeignKey(User, related_name='logins')
success = models.BooleanField()
datetime = models.DateTimeField()
class Meta:
get_latest_by = 'datetime'
How can I get a queryset of Users, which only contains users whose last login was not successful.
I know the following does not work, but it illustrates what I want to get:
User.objects.filter(login__latest__success=False)
I'm guessing I can do it with Q objects, and/or Case When, and/or some other form of annotation and filtering, but I can't suss it out.

We can use a Subquery here:
from django.db.models import OuterRef, Subquery
latest_login = Subquery(Login.objects.filter(
user=OuterRef('pk')
).order_by('-datetime').values('success')[:1])
User.objects.annotate(
latest_login=latest_login
).filter(latest_login=False)
This will generate a query that looks like:
SELECT auth_user.*, (
SELECT U0.success
FROM login U0
WHERE U0.user_id = auth_user.id
ORDER BY U0.datetime DESC
LIMIT 1
) AS latest_login
FROM auth_user
WHERE (
SELECT U0.success
FROM login U0
WHERE U0.user_id = auth_user.id
ORDER BY U0.datetime
DESC LIMIT 1
) = False
So the outcome of the Subquery is the success of the latest Login object, and if that is False, we add the related User to the QuerySet.

You can first annotate the max dates, and then filter based on success and the max date using F expressions:
User.objects.annotate(max_date=Max('logins__datetime'))\
.filter(logins__datetime=F('max_date'), logins__success=False)

for check bool use success=False and for get latest use latest()
your filter has been look this:
User.objects.filter(success=False).latest()

Getting distinct objects of a queryset from a reverse relation in Django

class Customer(models.Model):
name = models.CharField(max_length=189)
class Message(models.Model):
message = models.TextField()
customer = models.ForeignKey(Customer, on_delete=models.CASCADE, related_name="messages")
created_at = models.DateTimeField(auto_now_add=True)
What I want to do here is that I want to get the queryset of distinct Customers ordered by the Message.created_at. My database is mysql.
I have tried the following.
qs = Customers.objects.all().order_by("-messages__created_at").distinct()
m = Messages.objects.all().values("customer").distinct().order_by("-created_at")
m = Messages.objects.all().order_by("-created_at").values("customer").distinct()
In the end , I used a set to accomplish this, but I think I might be missing something. My current solution:
customers = set(Interaction.objects.all().values_list("customer").distinct())
customer_list = list()
for c in customers:
customer_list.append(c[0])
EDIT
Is it possible to get a list of customers ordered by according to their last message time but the queryset will also contain the last message value as another field?

Based on your comment you want to order the customers based on their latest message. We can do so by annotating the Customers and then sort on the annotation:
from dango.db.models import Max
Customer.objects.annotate(
last_message=Max('messages__crated_at')
).order_by("-last_message")
A potential problem is what to do for Customers that have written no message at all. In that case the last_message attribute will be NULL (None) in Python. We can specify this with nulls_first or nulls_last in the .order_by of an F-expression. For example:
from dango.db.models import F, Max
Customer.objects.annotate(
last_message=Max('messages__crated_at')
).order_by(F('last_message').desc(nulls_last=True))
A nice bonus is that the Customer objects of this queryset will have an extra attribute: the .last_message attribute will specify what the last time was when the user has written a message.
You can also decide to filter them out, for example with:
from dango.db.models import F, Max
Customer.objects.filter(
messages__isnull=False,
).annotate(
last_message=Max('messages__crated_at')
).order_by('-last_message')

Django queryset - Adding HAVING constraint

I have been using Django for a couple of years now but I am struggling today with adding a HAVING constraint to a GROUP BY.
My queryset is the following:
crm_models.Contact.objects\
.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_rels=Count('dealercontact'))
which gives me the following MySQL query:
SELECT *
FROM `contact`
LEFT OUTER JOIN `dealer_contact` ON (`contact`.`id_contact` = `dealer_contact`.`id_contact`)
WHERE (`dealer_contact`.`active` = True
AND `dealer_contact`.`activity` = 'gardening'
AND `contact`.`date_data_collected` >= '2012-10-01'
AND `contact`.`date_data_collected` < '2013-10-01'
AND `dealer_contact`.`id_dealer` IN (265))
GROUP BY `contact`.`id_contact`
ORDER BY NULL;
I would get exactly what I need with this HAVING constraint:
HAVING SUM(IF(`dealer_contact`.`type`='customer', 1, 0)) = 0
How can I get this fixed with a Django Queryset? I need a queryset in this instance.
Here I am using annotate only in order to get the GROUP BY on contact.id_contact.
Edit: My goal is to get the Contacts who have no "customer" relation in dealercontact but have "ref" relation(s) (according to the WHERE clause of course).
Models
class Contact(models.Model):
id_contact = models.AutoField(primary_key=True)
title = models.CharField(max_length=255L, blank=True, choices=choices_custom_sort(TITLE_CHOICES))
last_name = models.CharField(max_length=255L, blank=True)
first_name = models.CharField(max_length=255L, blank=True)
[...]
date_data_collected = models.DateField(null=True, db_index=True)
class Dealer(models.Model):
id_dealer = models.AutoField(primary_key=True)
address1 = models.CharField(max_length=45L, blank=True)
[...]
class DealerContact(Auditable):
id_dealer_contact = models.AutoField(primary_key=True)
contact = models.ForeignKey(Contact, db_column='id_contact')
dealer = models.ForeignKey(Dealer, db_column='id_dealer')
activity = models.CharField(max_length=32, choices=choices_custom_sort(ACTIVITIES), db_index=True)
type = models.CharField(max_length=32, choices=choices_custom_sort(DEALER_CONTACT_TYPE), db_index=True)

I figured this out by adding two binary fields in DealerContact: is_ref and is_customer.
If type='ref' then is_ref=1 and is_customer=0.
Else if type='customer' then is_ref=0 and is_customer=1.
Thus, I am now able to use annotate(nb_customers=Sum('is_customer')) and then use filter(nb_customers=0).
The final queryset consists in:
Contact.objects.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_customers=Sum('dealercontact__is_customer'))\
.filter(nb_customers=0)

Actually there is a way you can add your own custom HAVING and GROUP BY clauses if you need.
Just use my example with caution - if Django ORM code/paths will change in future Django versions, you will have to update your code too.
Image you have Book and Edition models, where for each book there can be multiple editions and you want to select first US edition date within Book queryset.
Adding custom HAVING and GROUP BY clauses in Django 1.5+:
from django.db.models import Min
from django.db.models.sql.where import ExtraWhere, AND
qs = Book.objects.all()
# Standard annotate
qs = qs.annotate(first_edition_date=Min("edition__date"))
# Custom HAVING clause, to limit annotation by US country only
qs.query.having.add(ExtraWhere(['"app_edition"."country"=%s'], ["US"]), AND)
# Custom GROUP BY clause will be needed too
qs.query.group_by.append(("app_edition", "country"))
ExtraWhere can contain not just fields, but any raw sql conditions and functions too.

Are you not using raw query just because you want orm object? Using Contact.objects.raw() generate instances similar filter. Refer to https://docs.djangoproject.com/en/dev/topics/db/sql/ for more help.

My goal is to get the Contacts who have no "customer" relation in
dealercontact but have "ref" relation(s) (according to the WHERE
clause of course).
This simple query fulfills this requirement:
Contact.objects.filter(dealercontact__type="ref").exclude(dealercontact__type="customer")
Is this enough, or do you need it to do something more?
UPDATE: if your requirement is
Contacts that have a "ref" relations, but do not have "customer"
relations with the same dealer
you can do this:
from django.db.models import Q
Contact.objects.filter(Q(dealercontact__type="ref") & ~Q(dealercontact__type="customer"))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Count of queryset where foreign key occurs exactly n times - django

Using conditional expressions: from django.db import models Reporter.objects.annotate( num_of_articles=models.Count( models.Case(models.When(article__pub_date=date, then=1), output_field=models.IntegerField()) ) ).filter(num_of_articles=2).count()

Try this, from django.db.models import Count Article.objects.filter(pub_date=date).values('reporter').annotate(article_count=Count('id')).filter(article_count=2) This would return a list as below, [{'reporter': 1, 'article_count': 2}] The 1 corresponds to reporter is the id of the reporter instance

Related

How to apply an arbitrary filter on a specific chained prefetch_related() within Django?

Django: cannot annotate using prefetch calculated attribute

How can I filter a Django queryset by the latest of a related model?

Getting distinct objects of a queryset from a reverse relation in Django

Django queryset - Adding HAVING constraint

Categories

Resources