Reusing subqueries for ordering in Django ORM - django

I run a dog salon where dogs get haircuts on an infrequent basis. In order to encourage owners back I would like to send out vouchers for their next visit. The voucher will be based on whether a dog has had a haircut within the last 2 months to 2 years. Beyond 2 years ago we can assume that the customer has been lost and less than 2 months ago is too close to their previous haircut. We will first target owners that have recently visited.
My underlying database is PostgreSQL.
from datetime import timedelta
from django.db import models
from django.db.models import Max, OuterRef, Subquery
from django.utils import timezone
# Dogs have one owner, owners can have many dogs, dogs can have many haircuts
class Owner(models.model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
name = models.CharField(max_length=255)
class Dog(models.model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
owner = models.ForeignKey(Owner, on_delete=models.CASCADE, related_name="dogs")
name = models.CharField(max_length=255)
class Haircut(models.model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
dog = models.ForeignKey(Dog, on_delete=models.CASCADE, related_name="haircuts")
at = models.DateField()
today = timezone.now().date()
start = today - timedelta(years=2)
end = today - timedelta(months=2)
It strikes me that the problem can be broken down into two queries. The first is something that aggregates an owner's dogs to most recently cut within the last 2 months to 2 years.
dog_aggregate = Haircut.objects.annotate(Max("at")).filter(at__range=(start, end))
And then joins the result of that to the owners table.
owners_by_shaggiest_dog_1 = Owner.objects # what's the rest of this?
Resulting in SQL similar to:
select
owner.id,
owner.name
from
(
select
dog.owner_id,
max(haircut.at) last_haircut
from haircut
left join dog on haircut.dog_id = dog.id
where
haircut.at
between current_date - interval '2' year
and current_date - interval '2' month
group by
dog.owner_id
) dog_aggregate
left join owner on dog_aggregate.owner_id = owner.id
order by
dog_aggregate.last_haircut asc,
owner.name;
Through some playing around I have managed to get the correct result with:
haircut_annotation = Subquery(
Haircut.objects
.filter(dog__owner=OuterRef("pk"), at__range=(start, end))
.order_by("-at")
.values("at")[:1]
)
owners_by_shaggiest_dog_2 = (
Owner.objects
.annotate(last_haircut=haircut_annotation)
.order_by("-last_haircut", "name")
)
However, the resulting SQL seems inefficient as a new query is performed for every row:
select
owner.id,
owner.name,
(
select
from haircut
inner join dog on haircut.dog_id = dog.id
where haircut.at
between current_date - interval '2' year
and current_date - interval '2' month
and dog.owner_id = (owner.id)
order by
haircut.at asc
limit 1
) last_haircut
from
owner
order by
last_haircut asc,
owner.name;
P.S. I don't actually run a dog salon so I can't give you a voucher. Sorry!

Given I understood it correctly, you can make a query like:
from django.db.models import Max
Owners.objects.filter(
dogs__haircuts__at__range=(start, end)
).annotate(
last_haircut=Max('dogs__haircuts__at')
).order_by('last_haircut', 'name')
The last haircut should be the Maximum here, since as time passes by, the timestamp is larger.
Note however that your query and this query, does not exclude owners of dogs that have been washed more recently. We simply do not take that into account when we calculate the last_haircut.
If you want to exclude such owners, you should build a query like:
from django.db.models import Max
Owners.objects.exclude(
dogs__haircuts__at__gt=end
).filter(
dogs__haircuts__at__range=(start, end)
).annotate(
last_haircut=Max('dogs__haircuts__at')
).order_by('last_haircut', 'name')

Related

django query left join, sum and group by

I have a model:
class Product(models.Model):
name = models.CharField(max_length=100)
class Sales(models.Model):
product_id = models.ForeignKey(Product, on_delete=models.CASCADE, related_name='products')
date = models.DateTimeField(null=True)
price = models.FloatField()
How do I return data as the following sql query (annotate sales with product name, group by product, day and month, and calculate sum of sales):
select p.name
, extract(day from date) as day
, extract(month from date) as month
, sum(s.price)
from timetracker.main_sales s
left join timetracker.main_product p on p.id = s.product_id_id
group by month, day, p.name;
Thanks,
If only ORM was as simple as sql... Spent several hours trying to figuring it out...
PS. Why when executing Sales.objects.raw(sql) with sql query above I get "Raw query must include the primary key"
You can annotate with:
from django.db.models import Sum
from django.db.models.functions import ExtractDay, ExtractMonth
Product.objects.values(
'name',
month=ExtractDay('products__date')
day=ExtractDay('products__date'),
).annotate(
total_price=Sum('products__price')
).order_by('name', 'month', 'day')
Note: Normally one does not add a suffix …_id to a ForeignKey field, since Django
will automatically add a "twin" field with an …_id suffix. Therefore it should
be product, instead of product_id.
Note: The related_name=… parameter [Django-doc]
is the name of the relation in reverse, so from the Product model to the Sales
model in this case. Therefore it (often) makes not much sense to name it the
same as the forward relation. You thus might want to consider renaming the products relation to sales.

query to django model to find best company sale in the month

I have two django model one "company" and the other is "MonthlyReport" of the company
I want to find out which company sale in current month had more than 20% of previous month sale
class Company(models.Model):
name = models.CharField(max_length=50)
class MonthlyReport(models.Model):
company = models.ForeignKey(Company,on_delete=models.CASCADE)
sale = models.IntegerField()
date = models.DateField()
How can i figure out this issue to find a company that has more than 20% sales over the previous month
You can certainly do it using the ORM. You will need to combine Max (or SUM depending on your use case) with a Q() expression filter and annotate the percentage increase to the queryset before filtering it.
You could do it in a single piece of code, but I have split it out because getting the dates and the query expressions are quite long. I have also put the increase value in a separate variable, rather than hardcoding it.
from datetime import datetime, timedelta
from django.db.models import Max, Q
SALES_INCREASE = 1.2
# Get the start dates of this month and last month
this_month = datetime.now().date().replace(day=1)
last_month = (this_month - timedelta(days=15)).replace(day=1)
# Get the maximum sale this month
amount_this_month = Max('monthlyreport__sale',
filter=Q(monthlyreport__date__gte=this_month))
# Get the maximum sale last month, but before this month
amount_last_month = Max('monthlyreport__sale',
filter=Q(monthlyreport__date__gte=last_month) & \
Q(monthlyreport__date__lt=this_month))
Company.objects.annotate(
percentage_increase=amount_this_month/amount_last_month
).filter(percentage_increase__gte=SALES_INCREASE)
Edit - removed incorrect code addition
There is probably a way to do this using ORM, but I would just go with python way:
First add related name to MonthlyReport
class Company(models.Model):
name = models.CharField(max_length=50)
class MonthlyReport(models.Model):
company = models.ForeignKey(Company, related_name="monthly_reports", on_delete=models.CASCADE)
sale = models.IntegerField()
date = models.DateField()
Then
best_companies = []
companies = Company.objects.all()
for company in companies:
two_last_monthly_reports = company.monthly_reports.order_by("date")[:2]
previous_report = two_last_monthly_reports[0]
current_report = two_last_monthly_reports[1]
if current_report.sale / previous_report.sale > 1.2:
best_companies.append(company)

How to join on multiple column with a groupby in Django/Postgres

I have the following tables that I need to join on date and currency:
class Transaction(models.Model):
description = models.CharField(max_length=100)
date = models.DateField()
amount = models.FloatField()
currency = models.ForeignKey(Currency, on_delete=models.PROTECT)
class ExchangeRate(models):
currency = models.ForeignKey(Currency, on_delete=models.PROTECT)
rate = models.FloatField()
date = models.DateField()
I need to join on both the date and currency columns, multiply the rate and the amount to give me the 'converted_amount'. I then need to group all the transactions by calendar month and sum up the 'converted_amount'.
Is this possible using the Django ORM or would I need to use SQL directly? If so, how do I go about doing this in Postgres?
Assuming that the Dates in the "Exchange rates" table are independent from the dates in the Transactions table, so that for each Transaction, the corresponding "Exchange rates".Date is the latest date which is less or equal than the Transactions.Date, you can try this in Postgres :
In Postgres :
SELECT t.Currency
, date_trunc('month', t.Date) AS period_of_time
, sum(t.amount * er.Rate) AS sum_by_currency_by_period_of_time
FROM Transactions AS t
CROSS JOIN LATERAL
( SELECT DISTINCT ON (er.Currency) er.Rate
FROM "Exchange rates" AS er
WHERE er.Currency = t.Currency
AND er.Date <= t.Date
ORDER BY er.Date DESC
) AS er
GROUP BY t.Currency, date_trunc('month', t.Date)
Assuming that your Currency model has a symbol column (change to your needs) you can achieve this with the following Django statements:
from your.models import Transaction, ExchangeRate
from django.db.models.functions import ExtractMonth
from django.db.models import Sum, F, Subquery, OuterRef
rates = ExchangeRate.objects.filter(
currency=OuterRef("currency"), date__lt=OuterRef("date")
).order_by("-date")
Transaction.objects.annotate(
month=ExtractMonth("date"),
rate=Subquery(rates.values("rate")[:1]),
conversion=F("amount") * F("rate"),
).values("currency__symbol", "month").annotate(sum=Sum("conversion")).order_by(
"currency", "month"
)
This will result in a list like:
{'currency__symbol': '$', 'month': 2, 'sum': 105.0},...
The subquery statement will annotate the last found exchange rate comparing the dates. Make sure that each transaction has an exchange rate (exchange rate date prior transaction date).

How to filter on a foreign key that is grouped?

Model:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=100)
class Result(models.Model):
person = models.ForeignKey(Person, on_delete=models.CASCADE)
outcome = models.IntegerField()
time = models.DateTimeField()
Sql:
select * from person as p
inner join (
select person_id, max(time) as max_time, outcome from result
group by person_id
) r on p.id = r.person_id
where r.result in (2, 3)
I'm wanting to get the all person records where the last result outcome was either a 2 or 3. I added the raw sql above to further explain.
I looked at using a subquery to filter person records that have a matching result id
sub_query = Result.objects.values("person_id").annotate(max_time=Max("time"))
however using values strips out the other fields.
Ideally I'd be able to do this in one person queryset but I don't think that is the case.
The below query exclude all persons whose marks not equal to 2 OR 3, then it sorts the results in time descending order (latest will be on top) and finally get the details for person ...
from django.db.models import Q
results = Results.objects.filter(Q(outcome=3) | Q(outcome=2)).order_by('-time').values('person')
As a person may have multiple result records and I only want to check the last record, A subquery was the only way I could find to do this
last_result = Subquery(
Result.objects.filter(person_id=OuterRef("pk")).order_by("-time").values("result")[:1]
)
people = Person.objects.all().annotate(max_time=Max("result__time"), current_result=last_result).filter(current_result__in=[2,3)
First I create a sub query that will return the last result record. Then I add this as a field in the people query so that I can filter on that for only results with 2 or 3.
This was it will only return person records where the current result is a 2 or 3.

django 2: Filter models by every day of current month

I got a simple models like this:
class CurrentMonthRegisterPage(models.Manager):
"""This manager class filter with current month."""
current_date = datetime.now().date()
def get_queryset(self):
return super(CurrentMonthRegisterPage, self).get_queryset().filter(
detail_hour__month=self.current_date.month, detail_hour__year=self.current_date.year)
class RegisterPage(models.Model):
OTHERS = '4'
EXTRA_TIME = '3'
EARLIER = '2'
ON_TIME = '1'
LATE = '0'
ABSENT = '-'
STATUS_LIST = (
(LATE, _("Late")),
(ON_TIME, _("On time")),
(EARLIER, _("Earlier")),
(EXTRA_TIME, _("Extra time")),
(OTHERS, _("ND")),
(ABSENT, _("Absent")),
)
detail_hour = models.DateTimeField(_('Date and hour'), auto_now_add=True)
details_mouvement = models.TextField(_("Déscription"), blank=True)
state = models.CharField(_("Statut"), max_length=1, choices=STATUS_LIST, default=ABSENT)
objects = RecentManager()
c_month = CurrentMonthRegisterPage()
Now i want to get a number of every state of every day of current month
Example:
Current month is March
How to get a number of state==LATE of every day of march ?
I want to get something like this:
queryset = [{'late':[1,1,3,5,....31], 'other_state': [1,2,...], ...}]
Please help ?
You need a query with fields for day and state, then you do a count (implicitly grouping by day and state):
from django.db.models import Count
from django.db.models.functions import Trunc
queryset = (RegisterPage.c_month
.annotate(day=Trunc('detail_hour', 'day'))
.values('day', 'state')
.annotate(count=Count('day'))
.order_by('day', 'state')
)
I've added an ordering clause to remove any existing ordering (that would thwart the desired grouping) and to sort the results.
The results only include days and states that are actually present in the data, if you want to include missing days or states with the count 0, you may want to do it in Python code.