How to aggregate over a single queryset in Django? - django

Short description: given a queryset myQueryset, how do I select max("myfield") without actually retrieving all rows and doing max in python?
The best I can think of is max([r["myfield"] for r in myQueryset.values("myfield")]), which isn't very good if there are millions of rows.
Long description: Say I have two models in my Django app, City and Country. City has a foreign key field to Country:
class Country(models.Model):
name = models.CharField(max_length = 256)
class City(models.Model):
name = models.CharField(max_length = 256)
population = models.IntegerField()
country = models.ForeignKey(Country, related_name = 'cities')
This means that a Country instance has .cities available. Let's say I now want to write a method for Country called highest_city_population that returns the population of the largest city. Coming from a LINQ background, my natural instinct is to try myCountry.cities.max('population') or something like that, but this isn't possible.

Use Aggregation (new in Django 1.1). You use it like this:
>>> from django.db.models import Max
>>> City.objects.all().aggregate(Max('population'))
{'population__max': 28025000}
To get the highest population of a City for each Country, I think you could do something like this:
>>> from django.db.models import Max
>>> Country.objects.annotate(highest_city_population = Max('city__population'))

Related

How to order by a value calculated based on some other model in Django?

For eg:
Model Company has fields: company_name and stock_price
Model Products has fields: product_price and company_name
I want to do something like: Company.objects.order_by( stock_price / [divided by] Products.objects.filter(company_name = Company__companyName).aggregate(Sum('product_price')).get('product_price__sum'))
Essentially, what I want to do is divide the stock price of company X by the aggregate of product_price of the products of company X and use the resulting calculation to order all the objects of Company. Where company_name is the foreign key.
I just want to know if doing such a thing is possible in Django. Thanks!
You can sort the companies by:
from django.db.models import F, Sum
Company.objects.order_by(
(F('stock_price')/Sum('products__product_price')).asc()
)
or prior to django-2.2, you first annotate and then .order_by(…):
from django.db.models import F, Sum
Company.objects.annotate(
order=F('stock_price')/Sum('products__product_price')
).order_by('order')

How do I construct an order_by for a specific record in a ManyToOne field?

I'm trying to sort (order) by statistical data stored in a ManyToOne relationship. Suppose I have the following code:
class Product(models.Model):
info = ...
data = models.IntegerField(default=0.0)
class Customer(models.Model):
info = ...
purchases = models.ManyToManyField(Product, related_name='customers', blank=True)
class ProductStats(models.Model):
ALL = 0
YOUNG = 1
OLD = 2
TYPE = ((ALL, 'All'), (YOUNG, 'Young'), (OLD, 'Old'),)
stats_type = models.SmallIntegerField(choices=TYPE)
product = models.ForeignKey(Product, related_name='stats', on_delete=models.CASCADE)
data = models.FloatField(default=0.0)
Then I would like to sort the products by their stats for the ALL demographic (assume every product has a stats connected to it for ALL). This might look something like the following:
products = Product.objects.all().order_by('stats__data for stats__stats_type=0')
Currently the only solution I can think of is either to create a new stats class just for all and use a OneToOneField for Product. Or, add a OneToOneField for Product pointing to the ALL stats in ProductStats.
Thank you for your help.
How about like this using multiple fields in order_by:
Product.objects.all().order_by('stats__data', 'stats__stats_type')
# it will order products from stats 0, then 1 then 2
Or if you want to get data for only stats_type 0:
Product.objects.filter(stats__stats_type=0).order_by('stats__data')
You can annotate the value of the relevant demographic and order by that:
from django.db.models import F
Product.objects.all().filter(stats__stats_type=0).annotate(data_for_all=F('stats__data').order_by('data_for_all')

Getting distinct objects of a queryset from a reverse relation in Django

class Customer(models.Model):
name = models.CharField(max_length=189)
class Message(models.Model):
message = models.TextField()
customer = models.ForeignKey(Customer, on_delete=models.CASCADE, related_name="messages")
created_at = models.DateTimeField(auto_now_add=True)
What I want to do here is that I want to get the queryset of distinct Customers ordered by the Message.created_at. My database is mysql.
I have tried the following.
qs = Customers.objects.all().order_by("-messages__created_at").distinct()
m = Messages.objects.all().values("customer").distinct().order_by("-created_at")
m = Messages.objects.all().order_by("-created_at").values("customer").distinct()
In the end , I used a set to accomplish this, but I think I might be missing something. My current solution:
customers = set(Interaction.objects.all().values_list("customer").distinct())
customer_list = list()
for c in customers:
customer_list.append(c[0])
EDIT
Is it possible to get a list of customers ordered by according to their last message time but the queryset will also contain the last message value as another field?
Based on your comment you want to order the customers based on their latest message. We can do so by annotating the Customers and then sort on the annotation:
from dango.db.models import Max
Customer.objects.annotate(
last_message=Max('messages__crated_at')
).order_by("-last_message")
A potential problem is what to do for Customers that have written no message at all. In that case the last_message attribute will be NULL (None) in Python. We can specify this with nulls_first or nulls_last in the .order_by of an F-expression. For example:
from dango.db.models import F, Max
Customer.objects.annotate(
last_message=Max('messages__crated_at')
).order_by(F('last_message').desc(nulls_last=True))
A nice bonus is that the Customer objects of this queryset will have an extra attribute: the .last_message attribute will specify what the last time was when the user has written a message.
You can also decide to filter them out, for example with:
from dango.db.models import F, Max
Customer.objects.filter(
messages__isnull=False,
).annotate(
last_message=Max('messages__crated_at')
).order_by('-last_message')

Count of queryset where foreign key occurs exactly n times

If I have a django model with a foreign key, e.g:
class Article(models.Model):
headline = models.CharField(max_length=100)
pub_date = models.DateField()
reporter = models.ForeignKey(Reporter, on_delete=models.CASCADE)
is there a way for me to get a count of the number of reporters that have exactly n articles on a specific date? For example, how many reporters have published exactly 2 articles "today"
date = timzone.now().date()
articles_on_date = Article.objects.filter(pub_date=date)
# now what can I do?
Edit:
Currently I can only figure out how to do it very inneficiently by looping and hitting the database way to many times.
Using conditional expressions:
from django.db import models
Reporter.objects.annotate(
num_of_articles=models.Count(
models.Case(models.When(article__pub_date=date, then=1), output_field=models.IntegerField())
)
).filter(num_of_articles=2).count()
Try this,
from django.db.models import Count
Article.objects.filter(pub_date=date).values('reporter').annotate(article_count=Count('id')).filter(article_count=2)
This would return a list as below,
[{'reporter': 1, 'article_count': 2}]
The 1 corresponds to reporter is the id of the reporter instance

Django: Annotate based on an annotation

Let's say I'm using Django to manage a database about athletes:
class Player(models.Model):
name = models.CharField()
weight = models.DecimalField()
team = models.ForeignKey('Team')
class Team(models.Model):
name = models.CharField()
sport = models.ForeignKey('Sport')
class Sport(models.Model):
name = models.CharField()
Let's say I wanted to compute the average weight of the players on each team. I think I'd do:
Team.objects.annotate(avg_weight=Avg(player__weight))
But now say that I want to compute the variance of team weights within each sport. Is there a way to do that using the Django ORM? How about using the extra() method on a QuerySet? Any advice is much appreciated.
you can use query like this :
class SumSubquery(Subquery):
template = "(SELECT SUM(`%(field)s`) From (%(subquery)s _sum))"
output_field = models.Floatfield()
def as_sql(self, compiler, connection, template=None, **extra_context):
connection.ops.check_expression_support(self)
template_params = {**self.extra, **extra_context}
template_params['subquery'], sql_params = self.queryset.query.get_compiler(connection=connection).as_sql()
template_params["field"] = list(self.queryset.query.annontation_select_mask)[0]
sql = template % template_params
return sql, sql_params
Team.objects.all().values("sport__name").annotate(variance=SumSubquery(Player.objects.filter(team__sport_id=OuterRef("sport_id")).annotate(sum_pow=ExpressionWrapper((Avg("team__players__weight") - F("weight"))**2,output_field=models.Floatfield())).values("sum_pow"))/(Count("players", output_field=models.FloatField())-1))
and add related name to model like this:
class Player(models.Model):
name = models.CharField()
weight = models.DecimalField()
team = models.ForeignKey('Team', related_name="players")
I'm going to assume (perhaps incorrectly) that you mean by 'variance' the difference between maximum and minimum weights. If so, you can generate more than one aggregate with a single query, like so:
from django.db.models import Avg, Max, Min
Team.objects.aggregate(Avg('player__weight'), Max('player__weight'), Min('player__weight'))
This is taken from the django docs on generating aggregation over a queryset.