Why my annotation group by is not working as expected? - django

I wanna do a query like raw sql select count(b.status) from a left out join b on a.id = b.user_id group_by(b.status). I know how to get what I wanted using raw sql. But when I try to implement this using orm, it failed. Below is the code:
query_kwargs_dict = {'id__in': [10000, 10001]}
statistics = UserAuthModel.objects.filter(**query_kwargs_dict).select_related(
'obbyuserinfomodel'
).values('obbyuserinfomodel__status').annotate(count=Count('obbyuserinfomodel__status')).query
print('statistics', statistics)
And the print statement output is:
statistics SELECT `users_obby_info`.`status`, COUNT(`users_obby_info`.`status`) AS `count` FROM `users_auth` LEFT OUTER JOIN `users_obby_info` ON (`users_auth`.`id` = `users_obby_info`.`user_id`) WHERE `users_auth`.`id` IN (10000, 10001) GROUP BY `users_auth`.`id`, `users_obby_info`.`status` ORDER BY `users_auth`.`id` DESC
What confusing me is the group_by. It should be grouped by the field specified in values. In my case, it is obbyuserinfomodel__status , but the output sql statement showed it was grouped by users_auth.id and users_obby_info.status together. What I want is just users_obby_info.status.
Below are the two tables(I have omitted some fields for simplicity):
class UserAuthModel(AbstractBaseUser):
...
class ObbyUserInfoModel(models.Model):
user = models.OneToOneField(UserAuthModel, on_delete=models.CASCADE)
...
status = models.CharField(max_length=15, default=ACCOUNTSTATUS.UNACTIVATED.value, null=True)

Related

How do i update sql value with sum comparison

I am trying to update a value to show the picked status of an order based on the picked status against the order qty. The data is in the same table but i cannot figure out the correct syntax. I tried:
Update Orders set Status = 'FULL' where Sum(Qty_Order) = sum(Qty_Picked)
How can i apply this logic using an aggregate query?
Thanks in advance for any help/
One approach uses an update join:
UPDATE Orders o1
INNER JOIN
(
SELECT id
FROM Orders
GROUP BY id
HAVING SUM(Qty_Order) = SUM(Qty_Picked)
) o2
ON o2.id = o1.id
SET
Status = 'FULL';
This assumes that your Orders table has a column id which uniquely identifies each order.

Django many-to-one related query with "and" condition

I have the following models struct:
class Drive(models.Model):
car_name = models.CharField(max_length=3,blank=True, null=True,choices=sp.CAR_NAMES ,help_text="The name of the car")
class DataEntity(models.Model):
parent_drive = models.ForeignKey(Drive,models.CASCADE)
type = models.IntegerField(blank=True, null=True,choices=sp.DATA_ENTITY_TYPES, help_text="The Type of the data")
And i'm trying to get all of the Drives that have DataEntity.type = 3 and DataEntity.type = 4
I tried to use the following:
query_set = Q{(AND: ('dataentity__type', 3), ('dataentity__type', 4))}
Drive.objects.filter(query_set).distinct()
but i got empty list...
I had a look on the sql statement:
SELECT ••• FROM `drive` INNER JOIN `data_entity` ON (`drive`.`id` = `data_entity`.`parent_drive_id`) WHERE (`data_entity`.`type` = 3 AND `data_entity`.`type` = 4)) subquery
The Django system put the condition inside the WHERE statement, and it cause the problem (there is no data DataEntity that contain the both types)
How can i make the right queryset in reason to get Drives that contain DataEntity.type = 3 and DataEntity.type = 4 ?
Thanks
You can try to do this:
Drive.objects.filter(dataentity__type__in=[3, 4]).distinct()
I found the solution.
When you used Q(dataentity__type=3)&Q(dataentity__type=4) the ORM system put the AND expression in the "Where" section:
SELECT ••• FROM drive` INNER JOIN data_entity ON (drive.id = data_entity.parent_drive_id) WHERE (data_entity.type = 3 AND data_entity.type = 4))
and i got 0 results since there is no dataentity that have two types.
But when i used Drive.objects.filter(Q(dataentity__type=3))&filter(Q(dataentity__type=4)).distinct()
I got the Drives that have dataentity of type 3 and also dataentity of type 4
The SQL Query:
SELECT ••• FROM `drive` INNER JOIN `data_entity` ON (`drive`.`id` = `data_entity`.`parent_drive_id`) LEFT OUTER JOIN `data_entity` T3 ON (`drive`.`id` = T3.`parent_drive_id`) WHERE (`data_entity`.`type` = 3 AND T3.`type` = 4)

SqlAlchemy core union_all not adding parentheses

I have the following sample code:
queries = []
q1 = select([columns]).where(table.c.id == #).limit(#)
queries.append(q1)
q2 = select([columns]).where(table.c.id == #).limit(#)
queries.append(q2)
final_query = union_all(*queries)
The generated SQL should be this:
(select columns from table where id = # limit #)
UNION ALL
(select columns from table where id = # limit #)
But, I'm getting
select columns from table where id = # limit #
UNION ALL
select columns from table where id = # limit #
I tried using subquery, as follows for my queries:
q1 = subquery(select([columns]).where(table.c.id == #).limit(#))
The generated query then looks like this:
SELECT UNION ALL SELECT UNION ALL
I also tried doing
q1 = select([columns]).where(table.c.id == #).limit(#)).subquery()
But, I get the error:
'Select' object has no attribute 'subquery'
Any help to get the desired output with my subqueries wrapped in parentheses?
Note: this is not a duplicate of this question, because I'm not using Session.
EDIT
Okay, this works, but I don't believe it is very efficient, and it's adding an extra select * from (my sub query), but it works.
q1 = select('*').select_from((select(columns).where(table.c.id == #).limit(#)).alias('q1'))
So, if anyone has any ideas to optimize, or let me know if this is as good as it gets. I would appreciate it.
The author of SQLAlchemy seems to be aware of this and mentions a workaround for it on the SQLAlchemy 1.1 changelog page. The general idea is to do .alias().select() on each select.
stmt1 = select([table1.c.x]).order_by(table1.c.y).limit(1).alias().select()
stmt2 = select([table2.c.x]).order_by(table2.c.y).limit(2).alias().select()
stmt = union(stmt1, stmt2)

Can JPA join to a codes table when part of join clause requires a hard coded value?

I would like the resulting entity to contain all the columns from table1, plus the description from codes1.
If I were to do this in SQL I would write it as follows:
select table1.*, codes1.description
from table1
inner joing codes1 where codes1.code = table1.status_code
and codes1.group = 'status'
I have done this with a native query, but would like to do this using straight JPA if possible.
Codes Table:
Group Code Description
status a status code a
status b status code b
other a other code a
If we imagine 2 objects: Table1 and Code1.
Your class Table1 contains of course Code1.
In "straight JPA" or jpql you select an object so the query will be:
select t from Table1 t where t.code1.group = 'status'
The join is automaticaly done by the mapping (#OneToOne, #ManyToOne...).

How to use subquery in django?

I want to get a list of the latest purchase of each customer, sorted by the date.
The following query does what I want except for the date:
(Purchase.objects
.all()
.distinct('customer')
.order_by('customer', '-date'))
It produces a query like:
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
I am forced to use customer_id as the first ORDER BY expression because of DISTINCT ON.
I want to sort by the date, so what the query I really need should look like this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I don't want to sort using python because I still got to page limit the query. There can be tens of thousands of rows in the database.
In fact it is currently sorted by in python now and is causing very long page load times, so that's why I'm trying to fix this.
Basically I want something like this https://stackoverflow.com/a/9796104/242969. Is it possible to express it with django querysets instead of writing raw SQL?
The actual models and methods are several pages long, but here is the set of models required for the queryset above.
class Customer(models.Model):
user = models.OneToOneField(User)
class Purchase(models.Model):
customer = models.ForeignKey(Customer)
date = models.DateField(auto_now_add=True)
item = models.CharField(max_length=255)
If I have data like:
Customer A -
Purchase(item=Chair, date=January),
Purchase(item=Table, date=February)
Customer B -
Purchase(item=Speakers, date=January),
Purchase(item=Monitor, date=May)
Customer C -
Purchase(item=Laptop, date=March),
Purchase(item=Printer, date=April)
I want to be able to extract the following:
Purchase(item=Monitor, date=May)
Purchase(item=Printer, date=April)
Purchase(item=Table, date=February)
There is at most one purchase in the list per customer. The purchase is each customer's latest. It is sorted by latest date.
This query will be able to extract that:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I'm trying to find a way not to have to use raw SQL to achieve this result.
This may not be exactly what you're looking for, but it might get you closer. Take a look at Django's annotate.
Here is an example of something that may help:
from django.db.models import Max
Customer.objects.all().annotate(most_recent_purchase=Max('purchase__date'))
This will give you a list of your customer models each one of which will have a new attribute called "most_recent_purchase" and will contain the date on which they made their last purchase. The sql produced looks like this:
SELECT "demo_customer"."id",
"demo_customer"."user_id",
MAX("demo_purchase"."date") AS "most_recent_purchase"
FROM "demo_customer"
LEFT OUTER JOIN "demo_purchase" ON ("demo_customer"."id" = "demo_purchase"."customer_id")
GROUP BY "demo_customer"."id",
"demo_customer"."user_id"
Another option, would be adding a property to your customer model that would look something like this:
#property
def latest_purchase(self):
return self.purchase_set.order_by('-date')[0]
You would obviously need to handle the case where there aren't any purchases in this property, and this would potentially not perform very well (since you would be running one query for each customer to get their latest purchase).
I've used both of these techniques in the past and they've both worked fine in different situations. I hope this helps. Best of luck!
Whenever there is a difficult query to write using Django ORM, I first try the query in psql(or whatever client you use). The SQL that you want is not this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id" "shop_purchase.id" "shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC, "shop_purchase.date" DESC;
) AS result
ORDER BY date DESC;
In the above SQL, the inner SQL is looking for distinct on a combination of (customer_id, id, and date) and since id will be unique for all, you will get all records from the table. I am assuming id is the primary key as per convention.
If you need to find the last purchase of every customer, you need to do something like:
SELECT "shop_purchase.customer_id", max("shop_purchase.date")
FROM shop_purchase
GROUP BY 1
But the problem with the above query is that it will give you only the customer name and date. Using that will not help you in finding the records when you use these results in a subquery.
To use IN you need a list of unique parameters to identify a record, e.g., id
If in your records id is a serial key, then you can leverage the fact that the latest date will be the maximum id as well. So your SQL becomes:
SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id";
Note that I kept only one field (id) in the selected clause to use it in a subquery using IN.
The complete SQL will now be:
SELECT *
FROM shop_customer
WHERE "shop_customer.id" IN
(SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id");
and using the Django ORM it looks like:
(Purchase.objects.filter(
id__in=Purchase.objects
.values('customer_id')
.annotate(latest=Max('id'))
.values_list('latest', flat=True)))
Hope it helps!
I have a similar situation and this is how I'm planning to go about it:
query = Purchase.objects.distinct('customer').order_by('customer').query
query = 'SELECT * FROM ({}) AS result ORDER BY sent DESC'.format(query)
return Purchase.objects.raw(query)
Upside it gives me the query I want. Downside is that it is raw query and I can't append any other queryset filters.
This is my approach if I need some subset of data (N items) along with the Django query. This is example using PostgreSQL and handy json_build_object() function (Postgres 9.4+), but same way you can use other aggregate function in other database system. For older PostgreSQL versions you can use combination of array_agg() and array_to_string() functions.
Imagine you have Article and Comment models and along with every article in the list you want to select 3 recent comments (change LIMIT 3 to adjust size of subset or ORDER BY c.id DESC to change sorting of subset).
qs = Article.objects.all()
qs = qs.extra(select = {
'recent_comments': """
SELECT
json_build_object('comments',
array_agg(
json_build_object('id', id, 'user_id', user_id, 'body', body)
)
)
FROM (
SELECT
c.id,
c.user_id,
c.body
FROM app_comment c
WHERE c.article_id = app_article.id
ORDER BY c.id DESC
LIMIT 3
) sub
"""
})
for article in qs:
print(article.recent_comments)
# Output:
# {u'comments': [{u'user_id': 1, u'id': 3, u'body': u'foo'}, {u'user_id': 1, u'id': 2, u'body': u'bar'}, {u'user_id': 1, u'id': 1, u'body': u'joe'}]}
# ....