Are queries using related_name more performant in Django? - django

Lets say I have the following models set up:
class Shop(models.Model):
...
class Product(models.Model):
shop = models.ForeignKey(Shop, related_name='products')
Now lets say we want to query all the products from the shop with label 'demo' whose prices are below $100. There are two ways to do this:
shop = Shop.objects.get(label='demo')
products = shop.products.filter(price__lte=100)
Or
shop = Shop.objects.get(label='demo')
products = Products.objects.filter(shop=shop, price__lte=100)
Is there a difference between these two queries? The first one is using the related_name property. I know foreign keys are indexed, so searching using them should be faster, but is this applicable in our first situation?

Short answer: this will result in equivalent queries.
We can do the test by printing the queries:
>>> print(shop.products.filter(price__lte=100).query)
SELECT "app_product"."id", "app_product"."shop_id", "app_product"."price" FROM "app_product" WHERE ("app_product"."shop_id" = 1 AND "app_product"."price" <= 100)
>>> print(Product.objects.filter(shop=shop, price__lte=100).query)
SELECT "app_product"."id", "app_product"."shop_id", "app_product"."price" FROM "app_product" WHERE ("app_product"."price" <= 100 AND "app_product"."shop_id" = 1)
except that the conditions in the WHERE are swapped, the two are equal. But usually this does not make any difference at the database side.
If you however are not interested in the Shop object itself, you can filter with:
products = Product.objects.filter(shop__label='demo', price__lte=100)
This will make a JOIN at the database level, and will thus retrieve the data in a single pass:
SELECT "app_product"."id", "app_product"."shop_id", "app_product"."price"
FROM "app_product"
INNER JOIN "app_shop" ON "app_product"."shop_id" = "app_shop"."id"
WHERE "app_product"."price" <= 100 AND "app_shop"."label" = demo

Related

Django Query - Get list that isnt in FK of another model

I am working on a django web app that manages payroll based on reports completed, and then payroll generated. 3 models as follows. (ive tried to limit to data needed for question).
class PayRecord(models.Model):
rate = models.FloatField()
user = models.ForeignKey(User)
class Payroll(models.Model):
company = models.ForeignKey(Company)
name = models.CharField()
class PayrollItem(models.Model):
payroll = models.ForeignKey(Payroll)
record = models.OneToOneField(PayRecord, unique=True)
What is the most efficient way to get all the PayRecords that aren't also in PayrollItem. So i can select them to create a payroll item.
There are 100k records, and my initial attempt takes minutes. Attempt tried below (this is far from feasible).
records_completed_in_payrolls = [
p.report.id for p in PayrollItem.objects.select_related(
'record',
'payroll'
)
]
Because you have the related field record in PayrollItem you can reach into that model while you filter PayRecord. Using the __isnull should give you what you want.
PayRecord.objects.filter(payrollitem__isnull=True)
Translates to a sql statement like:
SELECT payroll_payrecord.id,
payroll_payrecord.rate,
payroll_payrecord.user_id
FROM payroll_payrecord
LEFT OUTER JOIN payroll_payrollitem
ON payroll_payrecord.id = payroll_payrollitem.record_id
WHERE payroll_payrollitem.id IS NULL
Depending on your intentions, you may want to chain on a .select_related (https://docs.djangoproject.com/en/3.1/ref/models/querysets/#select-related)
PayRecord.objects.filter(payrollitem__isnull=True).select_related('user')
which translates to something like:
SELECT payroll_payrecord.id,
payroll_payrecord.rate,
payroll_payrecord.user_id,
payroll_user.id,
payroll_user.name
FROM payroll_payrecord
LEFT OUTER JOIN payroll_payrollitem
ON (payroll_payrecord.id = payroll_payrollitem.record_id)
INNER JOIN payroll_user
ON (payroll_payrecord.user_id = payroll_user.id)
WHERE payroll_payrollitem.id IS NULL

Django Query to get count of all distinct values for column of ArrayField

What is the most efficient way to count all distinct values for column of ArrayField.
Let's suppose I have a model with the name MyModel and cities field which is postgres.ArrayField.
#models.py
class MyModel(models.Model):
....
cities = ArrayField(models.TextField(blank=True),blank=True,null=True,default=list) ### ['mumbai','london']
and let's suppose our MyModel has the following 3 objects with cities field value as follow.
1. ['london','newyork']
2. ['mumbai']
3. ['london','chennai','mumbai']
Doing a count on distinct values for cities field does on the entire list instead of doing on each element.
## Query
MyModel.objects.values('cities').annotate(Count('id')).order_by().filter(id__count__gt=0)
Here I would like to count distinct values for cities field on each element of the list of cities field.which should give the following final output.
[{'london':2},{'newyork':1},{'chennai':1},{'mumbai':2}]
perform the group by operation in the database level itself.
from django.db import connection
cursor = connection.cursor()
raw_query = """
select unnest(subquery_alias.cities) as distinct_cities, count(*) as cities_group_by_count
from (select cities from sample_mymodel) as subquery_alias group by distinct_cities;
"""
cursor.execute(raw_query)
result = [{"city": row[0], "count": row[1]} for row in cursor]
print(result)
References
unnest()-postgress array function
Django: Executing custom SQL directly
Doing it with an in-efficient way out of Django syllabus:
unique_cities = list(data.values_list('cities',flat=True))
unique_cities_compiled = list(itertools.chain.from_iterable(unique_cities ))
unique_cities_final = {unique_cities_compiled .count(i) for i in unique_cities_compiled }
print(unique_cities_final )
{'london':2},{'newyork':1},{'chennai':1},{'mumbai':2}
if anyone does in much efficient way, do drop the answer for the improvised version of the solution.

Django query aggregate upvotes in backward relation

I have two models:
Base_Activity:
some fields
User_Activity:
user = models.ForeignKey(settings.AUTH_USER_MODEL)
activity = models.ForeignKey(Base_Activity)
rating = models.IntegerField(default=0) #Will be -1, 0, or 1
Now I want to query Base_Activity, and sort the items that have the most corresponding user activities with rating=1 on top. I want to do something like the query below, but the =1 part is obviously not working.
activities = Base_Activity.objects.all().annotate(
up_votes = Count('user_activity__rating'=1),
).order_by(
'up_votes'
)
How can I solve this?
You cannot use Count like that, as the error message says:
SyntaxError: keyword can't be an expression
The argument of Count must be a simple string, like user_activity__rating.
I think a good alternative can be to use Avg and Count together:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).order_by(
'-a', '-c'
)
The items with the most rating=1 activities should have the highest average, and among the users with the same average the ones with the most activities will be listed higher.
If you want to exclude items that have downvotes, make sure to add the appropriate filter or exclude operations after annotate, for example:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).filter(user_activity__rating__gt=0).order_by(
'-a', '-c'
)
UPDATE
To get all the items, ordered by their upvotes, disregarding downvotes, I think the only way is to use raw queries, like this:
from django.db import connection
sql = '''
SELECT o.id, SUM(v.rating > 0) s
FROM user_activity o
JOIN rating v ON o.id = v.user_activity_id
GROUP BY o.id ORDER BY s DESC
'''
cursor = connection.cursor()
result = cursor.execute(sql_select)
rows = result.fetchall()
Note: instead of hard-coding the table names of your models, get the table names from the models, for example if your model is called Rating, then you can get its table name with Rating._meta.db_table.
I tested this query on an sqlite3 database, I'm not sure the SUM expression there works in all DBMS. Btw I had a perfect Django site to test, where I also use upvotes and downvotes. I use a very similar model for counting upvotes and downvotes, but I order them by the sum value, stackoverflow style. The site is open-source, if you're interested.

django annotate question

I have the following model:
class Pick(models.Model):
league = models.ForeignKey(League)
user = models.ForeignKey(User)
team = models.ForeignKey(Team)
week = models.IntegerField()
result = models.IntegerField(default=3, help_text='loss=0, win=1, tie=2, not started=3, in progress=4')
I'm trying to get generate a standings table based off of the results, but I'm unsure how to get it done in a single query. I'm interested in getting, for each user in a particular league, a count of the results that = 1 (as win), 0 (as loss) and 2 as tie). The only thing I can think of is to do 3 separate queries where I filter the results and then annotate like so:
Pick.objects.filter(league=2, result=1).annotate(wins=Count('result'))
Pick.objects.filter(league=2, result=0).annotate(losses=Count('result'))
Pick.objects.filter(league=2, result=2).annotate(ties=Count('result'))
Is there a more efficient way to achieve this?
Thanks!
The trick to this is to use the values method to just select the fields you want to aggregate on.
Pick.objects.filter(league=2).values('result').aggregate(wins=Count('result'))

How do I get the values of the lastest entries grouped by an attributes using Django ORM?

I have a report model looking a bit like this:
class Report(models.Model):
date = models.DateField()
quantity = models.IntegerField()
product_name = models.TextField()
I know I can get the last entry for the last year for one product this way:
Report.objects.filter(date__year=2009, product_name="corn").order_by("-date")[0]
I know I can group entries by name this way:
Report.objects.values("product_name")
But how can I get the quantity for the last entry for each product ? I feel like I would do it this way in SQL (not sure, my SQL is rusty):
SELECT product_name, quantity FROM report WHERE YEAR(date) == 2009 GROUP_BY product_name HAVING date == Max(date)
My guess is to use the Max() object with annotate, but I have no idea how to.
For now, I do it by manually adding the last item of each query for each product_name I cant list with a distinct.
Not exactly a trivial query using either the Django ORM or SQL. My first take on it would be to pretty much what you are probably already doing; get the distinct product and date pairs and then perform individual queries for each of those.
year_products = Product.objects.filter(year=2009)
product_date_pairs = year_products.values('product').distinct('product'
).annotate(Max('date'))
[Report.objects.get(product=p['product'], date=p['date__max'])
for p in product_date_pairs]
But you can take it a step further with the Q operator and some fancy OR'ing to trim your query count down to 2 instead of N + 1.
import operator
qs = [Q(product=p['product'], date=p['date__max']) for p in product_date_pairs]
ored_qs = reduce(operator.or_, qs)
Report.objects.filter(ored_qs)