django query left join, sum and group by - django

I have a model:
class Product(models.Model):
name = models.CharField(max_length=100)
class Sales(models.Model):
product_id = models.ForeignKey(Product, on_delete=models.CASCADE, related_name='products')
date = models.DateTimeField(null=True)
price = models.FloatField()
How do I return data as the following sql query (annotate sales with product name, group by product, day and month, and calculate sum of sales):
select p.name
, extract(day from date) as day
, extract(month from date) as month
, sum(s.price)
from timetracker.main_sales s
left join timetracker.main_product p on p.id = s.product_id_id
group by month, day, p.name;
Thanks,
If only ORM was as simple as sql... Spent several hours trying to figuring it out...
PS. Why when executing Sales.objects.raw(sql) with sql query above I get "Raw query must include the primary key"

You can annotate with:
from django.db.models import Sum
from django.db.models.functions import ExtractDay, ExtractMonth
Product.objects.values(
'name',
month=ExtractDay('products__date')
day=ExtractDay('products__date'),
).annotate(
total_price=Sum('products__price')
).order_by('name', 'month', 'day')
Note: Normally one does not add a suffix …_id to a ForeignKey field, since Django
will automatically add a "twin" field with an …_id suffix. Therefore it should
be product, instead of product_id.
Note: The related_name=… parameter [Django-doc]
is the name of the relation in reverse, so from the Product model to the Sales
model in this case. Therefore it (often) makes not much sense to name it the
same as the forward relation. You thus might want to consider renaming the products relation to sales.

Related

query to django model to find best company sale in the month

I have two django model one "company" and the other is "MonthlyReport" of the company
I want to find out which company sale in current month had more than 20% of previous month sale
class Company(models.Model):
name = models.CharField(max_length=50)
class MonthlyReport(models.Model):
company = models.ForeignKey(Company,on_delete=models.CASCADE)
sale = models.IntegerField()
date = models.DateField()
How can i figure out this issue to find a company that has more than 20% sales over the previous month
You can certainly do it using the ORM. You will need to combine Max (or SUM depending on your use case) with a Q() expression filter and annotate the percentage increase to the queryset before filtering it.
You could do it in a single piece of code, but I have split it out because getting the dates and the query expressions are quite long. I have also put the increase value in a separate variable, rather than hardcoding it.
from datetime import datetime, timedelta
from django.db.models import Max, Q
SALES_INCREASE = 1.2
# Get the start dates of this month and last month
this_month = datetime.now().date().replace(day=1)
last_month = (this_month - timedelta(days=15)).replace(day=1)
# Get the maximum sale this month
amount_this_month = Max('monthlyreport__sale',
filter=Q(monthlyreport__date__gte=this_month))
# Get the maximum sale last month, but before this month
amount_last_month = Max('monthlyreport__sale',
filter=Q(monthlyreport__date__gte=last_month) & \
Q(monthlyreport__date__lt=this_month))
Company.objects.annotate(
percentage_increase=amount_this_month/amount_last_month
).filter(percentage_increase__gte=SALES_INCREASE)
Edit - removed incorrect code addition
There is probably a way to do this using ORM, but I would just go with python way:
First add related name to MonthlyReport
class Company(models.Model):
name = models.CharField(max_length=50)
class MonthlyReport(models.Model):
company = models.ForeignKey(Company, related_name="monthly_reports", on_delete=models.CASCADE)
sale = models.IntegerField()
date = models.DateField()
Then
best_companies = []
companies = Company.objects.all()
for company in companies:
two_last_monthly_reports = company.monthly_reports.order_by("date")[:2]
previous_report = two_last_monthly_reports[0]
current_report = two_last_monthly_reports[1]
if current_report.sale / previous_report.sale > 1.2:
best_companies.append(company)

How to join on multiple column with a groupby in Django/Postgres

I have the following tables that I need to join on date and currency:
class Transaction(models.Model):
description = models.CharField(max_length=100)
date = models.DateField()
amount = models.FloatField()
currency = models.ForeignKey(Currency, on_delete=models.PROTECT)
class ExchangeRate(models):
currency = models.ForeignKey(Currency, on_delete=models.PROTECT)
rate = models.FloatField()
date = models.DateField()
I need to join on both the date and currency columns, multiply the rate and the amount to give me the 'converted_amount'. I then need to group all the transactions by calendar month and sum up the 'converted_amount'.
Is this possible using the Django ORM or would I need to use SQL directly? If so, how do I go about doing this in Postgres?
Assuming that the Dates in the "Exchange rates" table are independent from the dates in the Transactions table, so that for each Transaction, the corresponding "Exchange rates".Date is the latest date which is less or equal than the Transactions.Date, you can try this in Postgres :
In Postgres :
SELECT t.Currency
, date_trunc('month', t.Date) AS period_of_time
, sum(t.amount * er.Rate) AS sum_by_currency_by_period_of_time
FROM Transactions AS t
CROSS JOIN LATERAL
( SELECT DISTINCT ON (er.Currency) er.Rate
FROM "Exchange rates" AS er
WHERE er.Currency = t.Currency
AND er.Date <= t.Date
ORDER BY er.Date DESC
) AS er
GROUP BY t.Currency, date_trunc('month', t.Date)
Assuming that your Currency model has a symbol column (change to your needs) you can achieve this with the following Django statements:
from your.models import Transaction, ExchangeRate
from django.db.models.functions import ExtractMonth
from django.db.models import Sum, F, Subquery, OuterRef
rates = ExchangeRate.objects.filter(
currency=OuterRef("currency"), date__lt=OuterRef("date")
).order_by("-date")
Transaction.objects.annotate(
month=ExtractMonth("date"),
rate=Subquery(rates.values("rate")[:1]),
conversion=F("amount") * F("rate"),
).values("currency__symbol", "month").annotate(sum=Sum("conversion")).order_by(
"currency", "month"
)
This will result in a list like:
{'currency__symbol': '$', 'month': 2, 'sum': 105.0},...
The subquery statement will annotate the last found exchange rate comparing the dates. Make sure that each transaction has an exchange rate (exchange rate date prior transaction date).

Django / PostGreSQL: Create queryset grouped by 'date' when each row has a different timezone

Let's say I have two models:
from django.db import model
class Company(model.Model):
name = models.TextField()
timezone = models.TextField()
class Sale(models.Model):
amount = models.IntegerField()
company = models.ForeignKey('Company')
time = models.DateTimeField()
I want to create a queryset grouped by date and company, where date refers to the calendar date of the sale at the timezone specified on the Company object.
This query:
result = Sale.objects.values(
'company', 'time__date'
).aggregate(
models.Sum('amount')
)
This returns the data in a format that works for me. However, the sales are grouped by UTC day. I want them grouped by the timezone on the Company objects.
What is the cleanest, quickest way to do this?
I know I could dump the entire set of values into Python, like this:
result = Sale.objects.values(
'amount', 'company__timezone', 'time'
).order_by(
'company_timezone'
)
for r in result:
r.date = r.time.astimezone(pytz.timezone(r.company_timezone)).date()
and then groupby, but is there a better way?
The solution is to use the TruncDate function, and pass the timezone string as an argument.
from django.db.models.functions import TruncDate
from django.db.models import F
...
local_time_daily_sales = Sale.objects.annotate(
date=TruncDate(tzinfo=F('company__timezone'))
).values(
date
).annotate(Sum('amount'))

Reusing subqueries for ordering in Django ORM

I run a dog salon where dogs get haircuts on an infrequent basis. In order to encourage owners back I would like to send out vouchers for their next visit. The voucher will be based on whether a dog has had a haircut within the last 2 months to 2 years. Beyond 2 years ago we can assume that the customer has been lost and less than 2 months ago is too close to their previous haircut. We will first target owners that have recently visited.
My underlying database is PostgreSQL.
from datetime import timedelta
from django.db import models
from django.db.models import Max, OuterRef, Subquery
from django.utils import timezone
# Dogs have one owner, owners can have many dogs, dogs can have many haircuts
class Owner(models.model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
name = models.CharField(max_length=255)
class Dog(models.model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
owner = models.ForeignKey(Owner, on_delete=models.CASCADE, related_name="dogs")
name = models.CharField(max_length=255)
class Haircut(models.model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
dog = models.ForeignKey(Dog, on_delete=models.CASCADE, related_name="haircuts")
at = models.DateField()
today = timezone.now().date()
start = today - timedelta(years=2)
end = today - timedelta(months=2)
It strikes me that the problem can be broken down into two queries. The first is something that aggregates an owner's dogs to most recently cut within the last 2 months to 2 years.
dog_aggregate = Haircut.objects.annotate(Max("at")).filter(at__range=(start, end))
And then joins the result of that to the owners table.
owners_by_shaggiest_dog_1 = Owner.objects # what's the rest of this?
Resulting in SQL similar to:
select
owner.id,
owner.name
from
(
select
dog.owner_id,
max(haircut.at) last_haircut
from haircut
left join dog on haircut.dog_id = dog.id
where
haircut.at
between current_date - interval '2' year
and current_date - interval '2' month
group by
dog.owner_id
) dog_aggregate
left join owner on dog_aggregate.owner_id = owner.id
order by
dog_aggregate.last_haircut asc,
owner.name;
Through some playing around I have managed to get the correct result with:
haircut_annotation = Subquery(
Haircut.objects
.filter(dog__owner=OuterRef("pk"), at__range=(start, end))
.order_by("-at")
.values("at")[:1]
)
owners_by_shaggiest_dog_2 = (
Owner.objects
.annotate(last_haircut=haircut_annotation)
.order_by("-last_haircut", "name")
)
However, the resulting SQL seems inefficient as a new query is performed for every row:
select
owner.id,
owner.name,
(
select
from haircut
inner join dog on haircut.dog_id = dog.id
where haircut.at
between current_date - interval '2' year
and current_date - interval '2' month
and dog.owner_id = (owner.id)
order by
haircut.at asc
limit 1
) last_haircut
from
owner
order by
last_haircut asc,
owner.name;
P.S. I don't actually run a dog salon so I can't give you a voucher. Sorry!
Given I understood it correctly, you can make a query like:
from django.db.models import Max
Owners.objects.filter(
dogs__haircuts__at__range=(start, end)
).annotate(
last_haircut=Max('dogs__haircuts__at')
).order_by('last_haircut', 'name')
The last haircut should be the Maximum here, since as time passes by, the timestamp is larger.
Note however that your query and this query, does not exclude owners of dogs that have been washed more recently. We simply do not take that into account when we calculate the last_haircut.
If you want to exclude such owners, you should build a query like:
from django.db.models import Max
Owners.objects.exclude(
dogs__haircuts__at__gt=end
).filter(
dogs__haircuts__at__range=(start, end)
).annotate(
last_haircut=Max('dogs__haircuts__at')
).order_by('last_haircut', 'name')

Django Queryset - extracting only date from datetime field in query (inside .value() )

I want to extract some particular columns from django query
models.py
class table
id = models.IntegerField(primaryKey= True)
date = models.DatetimeField()
address = models.CharField(max_length=50)
city = models.CharField(max_length=20)
cityid = models.IntegerField(20)
This is what I am currently using for my query
obj = table.objects.filter(date__range(start,end)).values('id','date','address','city','date').annotate(count= Count('cityid')).order_by('date','-count')
I am hoping to have a SQL query that is similar to this
select DATE(date), id,address,city, COUNT(cityid) as count from table where date between "start" and "end" group by DATE(date), address,id, city order by DATE(date) ASC,count DESC;
At least in Django 1.10.5, you can use something like this, without extra and RawSQL:
from django.db.models.functions import Cast
from django.db.models.fields import DateField
table.objects.annotate(date_only=Cast('date', DateField()))
And for filtering, you can use date lookup (https://docs.djangoproject.com/en/1.11/ref/models/querysets/#date):
table.objects.filter(date__date__range=(start, end))
For the below case.
select DATE(date), id,address,city, COUNT(cityid) as count from table where date between "start" and "end" group by DATE(date), address,id, city order by DATE(date) ASC,count DESC;
You can use extra where you can implement DB functions.
Table.objects.filter(date__range(start,end)).extra(select={'date':'DATE(date)','count':'COUNT(cityid)'}).values('date','id','address_city').order_by('date')
Hope it will help you.
Thanks.