COUNT(id) instead of COUNT(*) in Django

COUNT(id) instead of COUNT(*) in Django - django

I'm paginating a list view for a model with many fields, so it takes a lot of time to execute MyModel.objects.filter('some filter').count(), because on SQL level it runs
SELECT COUNT(*) FROM mytable
instead of:
SELECT COUNT(id) FROM mytable
even if I write explicitly
MyModel.objects.only('id').count()
How can I make Django run COUNT(id) on .count()?
Update:
I'm using PostgreSQL.

Try using:
MyModel.objects.filter('some filter').values_list("id).count()
This will do this query:
select count(id) from MyModel

Try using an aggregate query:
from django.db import models
MyObject.objects.all().aggregate(models.Count('id'))['id__count']

COUNT(id) is roughly the equivalent of:
MyModel.objects.exclude(id=None).count()
this will add extra step to count the table fields, which is not the case for COUNT(*)

Related

Creating a Django object with field values from a SELECT equivalent

I am trying to figure out how to reproduce the following using Django - anyone help?
INSERT INTO table1 (table2_id, a_field)
SELECT table2.id as table2_id, table3.a_field
FROM table2
INNER JOIN table3 ON
table3.table2_id == table2.id
WHERE table2.id = 123
If I've got this correct (not my original query ;-) ), this is doing the following:
Creating an entry in table1 where...
a field named table2_id will match the id of a row in table2 and
a field named a_field will match the same named field in a_field in a row of table3 and
the table2/table3 objects from which these values are read are identified by a shared table2.id/table3.table_id2 relationship and also the table2 id being 123.
I don't see how such "calculated" field values can be passed to a create() or get_or_create() style command. It this perhaps possible using Q() objects?

Django model is an ORM framework.
In ORM way, you need
get table2Entity
construct a new table1Entity with table2Entity and related table3Entity values
save the table1Entity
def batch_save_entity2():
entity2 = Table2Entity.objects.get('123')
entity1 = Table1Entity()
entity1.table2_id = entity2.id
entity1.a_field = entity2.entity3.a_field
entity1.save()
or just execute sql directly without ORM
from django.db import connection
def my_custom_sql(self):
with connection.cursor() as cursor:
cursor.execute('''
INSERT INTO table1 (table2_id, a_field)
SELECT table2.id as table2_id, table3.a_field
FROM table2
INNER JOIN table3 ON
table3.table2_id == table2.id
WHERE table2.id = 123''')

Django ORM and GROUP BY

Newcommer to Django here.
I'm currently trying to fetch some data from my model with a query that need would need a GROUP BY in SQL.
Here is my simplified model:
class Message(models.Model):
mmsi = models.CharField(max_length=16)
time = models.DateTimeField()
point = models.PointField(geography=True)
I'm basically trying to get the last Message from every distinct mmsi number.
In SQL that would translates like this for example:
select a.* from core_message a
inner join
(select mmsi, max(time) as time from core_message group by mmsi) b
on a.mmsi=b.mmsi and a.time=b.time;
After some tries, I managed to have something working similarly with Django ORM:
>>> mf=Message.objects.values('mmsi').annotate(Max('time'))
>>> Message.objects.filter(mmsi__in=mf.values('mmsi'),time__in=mf.values('time__max'))
That works, but I find my Django solution quite clumsy. Not sure it's the proper way to do it.
Looking at the underlying query this looks like this :
>>> print(Message.objects.filter(mmsi__in=mf.values('mmsi'),time__in=mf.values('time__max')).query)
SELECT "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea FROM "core_message" WHERE ("core_message"."mmsi" IN (SELECT U0."mmsi" FROM "core_message" U0 GROUP BY U0."mmsi") AND "core_message"."time" IN (SELECT MAX(U0."time") AS "time__max" FROM "core_message" U0 GROUP BY U0."mmsi"))
I'd appreciate if you could propose a better solution for this problem.
Thanks !

You only need something like this:
Message.objects.all().distinct('mmsi').values('mmsi', 'time').order_by('mmsi','-id')
or like this:
Message.objects.all().values('mmsi').annotate(date_last=Max('time'))
Note: the last is translate by Django in this sql query:
SELECT "message"."mmsi", MAX("message"."time") AS "date_last" FROM "message" GROUP BY "message"."mmsi", "message"."time" ORDER BY "message"."time" DESC

Using the answers and comments, I managed to solve this using a subquery or a simple distinct order by.
Simple distinct order by solution inspired by #Oriphiel answer:
Message.objects.distinct('mmsi').order_by('mmsi','-time')
The underlying SQL query looks like this :
SELECT DISTINCT ON ("core_message"."mmsi") "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea
FROM "core_message"
ORDER BY "core_message"."mmsi" ASC, "core_message"."time" DESC
Simple and straightforward.
Subquery solution inspired by #DanielRoseman comment:
time_order=Message.objects.filter(mmsi=OuterRef('mmsi')).order_by('-time')
Message.objects.filter(id__in=Subquery(time_order.values('id')[:1]))
The underlying SQL query looks like this :
SELECT "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea
FROM "core_message"
WHERE "core_message"."id" IN
(SELECT U0."id" FROM "core_message" U0 WHERE U0."mmsi" = ("core_message"."mmsi") ORDER BY U0."time" DESC LIMIT 1)
A tad more complex but it gives more flexibility. If I wanted to get first five messages for every MMSI, I'd just need to change the LIMIT value. In Django, it would look like this :
Message.objects.filter(id__in=Subquery(time_order.values('id')[:5]))

Is there any way to convert this into django orm?

I needed to get all the rows in the table1 even if it is not existing in table2 and display it as zero. I got it using raw sql query but in django ORM i am getting the values existing only in table2. The only difference on my django orm is that iI am using inner join while in the raw sql query I am using left join. Is there any way to achieve this or should I use raw sql query? Thanks.
Django ORM:
total=ApplicantInfo.objects.select_related('source_type').values('source_type__source_type').annotate(total_count=Count('source_type'))
OUTPUT OF DJANGO ORM IN RAW SQL:
SELECT "applicant_sourcetype"."source_type", COUNT("applicant_applicantinfo"."source_type_id") AS "total_count" FROM "applicant_applicantinfo" INNER JOIN "applicant_sourcetype" ON ("applicant_applicantinfo"."source_type_id" = "applicant_sourcetype"."id") GROUP BY "applicant_sourcetype"."source_type"
RAW SQL:
SELECT source.source_type, count(info.source_type_id) as total_counts from applicant_sourcetype as source LEFT JOIN applicant_applicantinfo as info ON source.id = info.source_type_id GROUP BY source.id

You cannot query on ApplicantInfo if you want a left join with it. Query on SourceType instead:
qs = (SourceType.objects
.values('source_type')
.annotate(cnt=Count('applicantinfo'))
.values('source_type', 'cnt')
)
Since you haven't posted the models, you may need to adapt the field names to make it work.

How to enforce Django to use "JOIN VALUES"

I'm having a performance problem where I need to replace section of my query statement. Right now I have a the following:
select count(*) FROM "mytable" WHERE "field" IN ('v1', 'v2', ..., 'vN');
this can be translated to Django ORM:
Mytable.objects.all().filter(field__in=[myvalues]).count()
I need to do the following though:
select count(*) FROM "mytable" JOIN (values ('v1', 'v2', ..., 'vN')) as lookup(value) on lookup.value = "mytable".field;
Is there a way to add this to the ORM? I need to do with ORM because I already have other filters. Worst case scenario I thought of getting the query string and adding there manually...
I'm using Postgresql 9.6

I found a way after reading over and over the documentation. I even found a patch that was not merged a while ago.
It doesn't really do the join, but it works much faster than using __in straightforward.
What I'm doing is executing a RawSQL() that was introduced in Django 2.0 and with that result I do the __in again.
So here is a code example:
query = """select myfield from mytable join (values
('v1'), ('v2'), ..., ('vN')
) as lookup(value) on lookup.value = mytable.myfield"""
r = RawSQL(query, [])
mymodel.filter(myfield__in=r)
Now it takes miliseconds instead of minutes!

How to use subquery in django?

I want to get a list of the latest purchase of each customer, sorted by the date.
The following query does what I want except for the date:
(Purchase.objects
.all()
.distinct('customer')
.order_by('customer', '-date'))
It produces a query like:
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
I am forced to use customer_id as the first ORDER BY expression because of DISTINCT ON.
I want to sort by the date, so what the query I really need should look like this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I don't want to sort using python because I still got to page limit the query. There can be tens of thousands of rows in the database.
In fact it is currently sorted by in python now and is causing very long page load times, so that's why I'm trying to fix this.
Basically I want something like this https://stackoverflow.com/a/9796104/242969. Is it possible to express it with django querysets instead of writing raw SQL?
The actual models and methods are several pages long, but here is the set of models required for the queryset above.
class Customer(models.Model):
user = models.OneToOneField(User)
class Purchase(models.Model):
customer = models.ForeignKey(Customer)
date = models.DateField(auto_now_add=True)
item = models.CharField(max_length=255)
If I have data like:
Customer A -
Purchase(item=Chair, date=January),
Purchase(item=Table, date=February)
Customer B -
Purchase(item=Speakers, date=January),
Purchase(item=Monitor, date=May)
Customer C -
Purchase(item=Laptop, date=March),
Purchase(item=Printer, date=April)
I want to be able to extract the following:
Purchase(item=Monitor, date=May)
Purchase(item=Printer, date=April)
Purchase(item=Table, date=February)
There is at most one purchase in the list per customer. The purchase is each customer's latest. It is sorted by latest date.
This query will be able to extract that:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I'm trying to find a way not to have to use raw SQL to achieve this result.

This may not be exactly what you're looking for, but it might get you closer. Take a look at Django's annotate.
Here is an example of something that may help:
from django.db.models import Max
Customer.objects.all().annotate(most_recent_purchase=Max('purchase__date'))
This will give you a list of your customer models each one of which will have a new attribute called "most_recent_purchase" and will contain the date on which they made their last purchase. The sql produced looks like this:
SELECT "demo_customer"."id",
"demo_customer"."user_id",
MAX("demo_purchase"."date") AS "most_recent_purchase"
FROM "demo_customer"
LEFT OUTER JOIN "demo_purchase" ON ("demo_customer"."id" = "demo_purchase"."customer_id")
GROUP BY "demo_customer"."id",
"demo_customer"."user_id"
Another option, would be adding a property to your customer model that would look something like this:
#property
def latest_purchase(self):
return self.purchase_set.order_by('-date')[0]
You would obviously need to handle the case where there aren't any purchases in this property, and this would potentially not perform very well (since you would be running one query for each customer to get their latest purchase).
I've used both of these techniques in the past and they've both worked fine in different situations. I hope this helps. Best of luck!

Whenever there is a difficult query to write using Django ORM, I first try the query in psql(or whatever client you use). The SQL that you want is not this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id" "shop_purchase.id" "shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC, "shop_purchase.date" DESC;
) AS result
ORDER BY date DESC;
In the above SQL, the inner SQL is looking for distinct on a combination of (customer_id, id, and date) and since id will be unique for all, you will get all records from the table. I am assuming id is the primary key as per convention.
If you need to find the last purchase of every customer, you need to do something like:
SELECT "shop_purchase.customer_id", max("shop_purchase.date")
FROM shop_purchase
GROUP BY 1
But the problem with the above query is that it will give you only the customer name and date. Using that will not help you in finding the records when you use these results in a subquery.
To use IN you need a list of unique parameters to identify a record, e.g., id
If in your records id is a serial key, then you can leverage the fact that the latest date will be the maximum id as well. So your SQL becomes:
SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id";
Note that I kept only one field (id) in the selected clause to use it in a subquery using IN.
The complete SQL will now be:
SELECT *
FROM shop_customer
WHERE "shop_customer.id" IN
(SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id");
and using the Django ORM it looks like:
(Purchase.objects.filter(
id__in=Purchase.objects
.values('customer_id')
.annotate(latest=Max('id'))
.values_list('latest', flat=True)))
Hope it helps!

I have a similar situation and this is how I'm planning to go about it:
query = Purchase.objects.distinct('customer').order_by('customer').query
query = 'SELECT * FROM ({}) AS result ORDER BY sent DESC'.format(query)
return Purchase.objects.raw(query)
Upside it gives me the query I want. Downside is that it is raw query and I can't append any other queryset filters.

This is my approach if I need some subset of data (N items) along with the Django query. This is example using PostgreSQL and handy json_build_object() function (Postgres 9.4+), but same way you can use other aggregate function in other database system. For older PostgreSQL versions you can use combination of array_agg() and array_to_string() functions.
Imagine you have Article and Comment models and along with every article in the list you want to select 3 recent comments (change LIMIT 3 to adjust size of subset or ORDER BY c.id DESC to change sorting of subset).
qs = Article.objects.all()
qs = qs.extra(select = {
'recent_comments': """
SELECT
json_build_object('comments',
array_agg(
json_build_object('id', id, 'user_id', user_id, 'body', body)
)
)
FROM (
SELECT
c.id,
c.user_id,
c.body
FROM app_comment c
WHERE c.article_id = app_article.id
ORDER BY c.id DESC
LIMIT 3
) sub
"""
})
for article in qs:
print(article.recent_comments)
# Output:
# {u'comments': [{u'user_id': 1, u'id': 3, u'body': u'foo'}, {u'user_id': 1, u'id': 2, u'body': u'bar'}, {u'user_id': 1, u'id': 1, u'body': u'joe'}]}
# ....

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

COUNT(id) instead of COUNT(*) in Django - django

Try using: MyModel.objects.filter('some filter').values_list("id).count() This will do this query: select count(id) from MyModel

Try using an aggregate query: from django.db import models MyObject.objects.all().aggregate(models.Count('id'))['id__count']

COUNT(id) is roughly the equivalent of: MyModel.objects.exclude(id=None).count() this will add extra step to count the table fields, which is not the case for COUNT(*)

Related

Creating a Django object with field values from a SELECT equivalent

Django ORM and GROUP BY

Is there any way to convert this into django orm?

How to enforce Django to use "JOIN VALUES"

How to use subquery in django?

Categories

Resources