Mutliple fields in the where clause of a QuerySet

Mutliple fields in the where clause of a QuerySet - django

In a e-shop application I have a model for storing history of order processing:
class OrderStatusHistory(models.Model):
order = models.ForeignKey(Order, related_name='status_history')
status = models.IntegerField()
date = models.DateTimeField(auto_now_add=True)
Now I want to get most recent status for each order. In pure SQL I can do this in a single query:
select * from order_orderstatushistory where (order_id, date) in (select order_id, max(date) from order_orderstatushistory group by order_id)
What is the best way do this in Django?
I have two options. The first is:
# Get the most recent status change date for each order
v = Order.objects.annotate(Max('status_history__date')).values_list('id', 'status_history__date__max')
# Get OrderStatusHistory objects
hist = OrderStatusHistory.objects.extra(where=['(order_id, date) IN %s'], params=[tuple(v)])
And the second:
hist = OrderStatusHistory.objects.extra(where=['(order_id, date) IN (select order_id, max(date) from order_orderstatushistory group by order_id)'])
The first option is pure Django, but results in 2 database queries and a large list of parameters passed from Django to database engine.
The second option requires to put SQL code directly into my application, but I'd like to avoid this.
Is there equivalent of where (order_id, date) in (select ...) in Django?

Have you tried using this query
OrderStatusHistory.objects.order_by('order', '-date').distinct('order')

Related

Django sum values of a column after 'group by' in another column

I found some solutions here and in the django documentation, but I could not manage to make one query work the way I wanted.
I have the following model:
class Inventory(models.Model):
blindid = models.CharField(max_length=20)
massug = models.IntegerField()
I want to count the number of Blind_ID and then sum the massug after they were grouped.
My currently Django ORM
samples = Inventory.objects.values('blindid', 'massug').annotate(aliquots=Count('blindid'), total=Sum('massug'))
It's not counting correctly (it shows only one), thus it 's not summing correctly. It seems it is only getting the first result... I tried to use Count('blindid', distinct=True) and Count('blindid', distinct=False) as well.
This is the query result using samples.query. Django is grouping by the two columns...
SELECT "inventory"."blindid", "inventory"."massug", COUNT("inventory"."blindid") AS "aliquots", SUM("inventory"."massug") AS "total" FROM "inventory" GROUP BY "inventory"."blindid", "inventory"."massug"
This should be the raw sql
SELECT blindid,
Count(blindid) AS aliquots,
Sum(massug) AS total
FROM inventory
GROUP BY blindid

Try this:
samples = Inventory.objects.values('blindid').annotate(aliquots=Count('blindid'), total=Sum('massug'))

getting most recent by joining subquery in Django ORM

I have a table of log entries that have user_id and datetime. I'd like to make a query to fetch the most recent of each log entry by user_id. I can't seem to figure out how to do that...
The SQL for the query would be something like this:
SELECT *
FROM table a
JOIN (SELECT user_id, max(datetime) maxDate
FROM table
GROUP BY user_id) b
ON a.user_id = b.user_id AND a.datetime = b.maxDate
Right now I'm doing it with a raw query, but I'd like to use the ORM's methods.

I suppose this should do
Table.objects.order_by('-user_id').distinct('user_id')
See distinct() in this -> https://docs.djangoproject.com/en/1.9/ref/models/querysets/
But this will only work if the latest log entry for that user is the last entry by that user in the table, i.e., the log entries of a particular user as sorted in ascending way in the table.

You may try:
User.objects.latest().id

Table.objects.order_by('user_id', '-datetime').distinct('user_id')
Add indexes to user_id and datetime (Meta.index_together = ['user_id', 'datetime']).

Django ORM: is it possible to inject subqueries?

I have a Django model that looks something like this:
class Result(models.Model):
date = DateTimeField()
subject = models.ForeignKey('myapp.Subject')
test_type = models.ForeignKey('myapp.TestType')
summary = models.PositiveSmallIntegerField()
# more fields about the result like its location, tester ID and so on
Sometimes we want to retrieve all the test results, other times we only want the most recent result of a particular test type for each subject. This answer has some great options for SQL that will find the most recent result.
Also, we sometimes want to bucket the results into different chunks of time so that we can graph the number of results per day / week / month.
We also want to filter on various fields, and for elegance I'd like a QuerySet that I can then make all the filter() calls on, and annotate for the counts, rather than making raw SQL calls.
I have got this far:
qs = Result.objects.extra(select = {
'date_range': "date_trunc('{0}', time)".format("day"), # Chunking into time buckets
'rn' : "ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)"})
qs = qs.values('date_range', 'result_summary', 'rn')
qs = qs.order_by('-date_range')
which results in the following SQL:
SELECT (ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)) AS "rn", (date_trunc('day', time)) AS "date_range", "myapp_result"."result_summary" FROM "myapp_result" ORDER BY "date_range" DESC
which is kind of approaching what I'd like, but now I need to somehow filter to only get the rows where rn = 1. I tried using the 'where' field in extra(), which gives me the following SQL and error:
SELECT (ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)) AS "rn", (date_trunc('day', time)) AS "date_range", "myapp_result"."result_summary" FROM "myapp_result" WHERE "rn"=1 ORDER BY "date_range" DESC ;
ERROR: column "rn" does not exist
So I think the query that finds "rn" needs to be a subquery - but is it possible to do that somehow, perhaps using extra()?
I know I could do this with raw SQL but it just looks ugly! I'd love to find a nice neat way where I have a filterable QuerySet.
I guess the other option is to have a field in the model that indicates whether it is actually the most recent result of that test type for that subject...

I've found a way!
qs = Result.objects.extra(where = ["NOT EXISTS(SELECT * FROM myapp_result as T2 WHERE (T2.test_type_id = myapp_result.test_type_id AND T2.subject_id = myapp_result.subject ID AND T2.time > myapp_result.time))"])
This is based on a different option from the answer I referenced earlier. I can filter or annotate qs with whatever I want.
As an aside, on the way to this solution I tried this:
qq = Result.objects.extra(where = ["NOT EXISTS(SELECT * FROM myapp_result as T2 WHERE (T2.test_type_id = myapp_result.test_type_id AND T2.subject_id = myapp_result.subject ID AND T2.time > myapp_result.time))"])
qs = Result.objects.filter(id__in=qq)
Django embeds the subquery just as you want it to:
SELECT ...some fields... FROM "myapp_result"
WHERE ("myapp_result"."id" IN (SELECT "myapp_result"."id" FROM "myapp_result"
WHERE (NOT EXISTS(SELECT * FROM myapp_result as T2
WHERE (T2.subject_id = myapp_result.subject_id AND T2.test_type_id = myapp_result.test_type_id AND T2.time > myapp_result.time)))))
I realised this had more subqueries than I need, but I note it here as I can imagine it being useful to know that you can filter one queryset with another and Django does exactly what you'd hope for in terms of embedding the subquery (rather than, say, executing it and embedding the returned values, which would be horrid.)

What is the internal function in django to add new tables to a queryset in a sensible way?

In django 1.2:
I have a queryset with an extra parameter which refers to a table which is not currently included in the query django generates for this queryset.
If I add an order_by to the queryset which refers to the other table, django adds joins to the other table in the proper way and the extra works. But without the order_by, the extra parameter is failing. I could just add a useless secondary order_by to something in the other table, but I think there should be a better way to do it.
What is the django function to add joins in a sensible way? I know this must be getting called somewhere.
Here is some sample code. It selects all readings for a given user, and annotates the results with the rating (if any) given by another user stored in 'friend'.
class Book(models.Model):
name = models.CharField(max_length=200)
urlname = models.CharField(max_length=200)
entrydate=models.DateTimeField(auto_now_add=True)
class Reading(models.Model):
book=models.ForeignKey(Book,related_name='readings')
user=models.ForeignKey(User)
rating=models.IntegerField()
entrydate=models.DateTimeField(auto_now_add=True)
readings=Reading.objects.filter(user=user).order_by('entrydate')
friendrating='(select rating from proj_reading where user_id=%d and \
book_id=proj_book.id and rating in (1,2,3,4,5,6))'%friend.id
readings=readings.extra(select={'friendrating':friendrating})
at the moment, readings won't work because the join to readings is not set up correctly. however, if I add an order by such as:
.order_by('entrydate','reading__entrydate')
django magically knows to add an inner join through the foreign key and I get what I want.
additional information:
print readings.query ==>
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
assuming
user.id=1
friend.id=2
the error is:
OperationalError: Unknown column proj_book.id in 'where clause'
and it happens because the table proj_book is not included in the query. To restate what I said above - if I now do readings2=readings.order_by('book__entrydate') I can see the proper join is set up and the query works.
Ideally I'd just like to figure out what the name of the qs.query function is that looks at two tables and figures out how they are joined by foreign keys, and just call that manually.

Your generated query:
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
The db has no way to understand what does it mean by proj_book, since it is not included in (from tables or inner join).
You are getting expected results, when you add order_by, because that order_by query is adding inner join between proj_book and proj_reading.
As far as I understand, if you refer any other column in Book, not just order_by, you will get similar results.
Q1 = Reading.objects.filter(user=user).exclude(Book__name='') # Exclude forces to add JOIN
Q2 = "Select rating from proj_reading where user_id=%d" % user.id
Result = Q1.extra("foo":Q2)
This way, at step Q1, you are forcing DJango to add join on Book table, which is not default, unless you access any field of Book table.

you mean:
class SomeModel(models.Model)
id = models.IntegerField()
...
class SomeOtherModel(models.Model)
otherfield = models.ForeignKey(SomeModel)
qrst = SomeOtherModel.objects.filter(otherfield__id=1)
You can use "__" to create table joins.
EDIT:
It wont work because you do not define table join correctly.
myrating='(select rating from proj_reading inner join proj_book on (proj_book.id=proj_reading_id) where proj_reading.user_id=%d and rating in (1,2,3,4,5,6))'%user.id)'
This is a pesdocode and it is not tested.
But, i advice you to use django filters instead of writing sql queries.
read = Reading.objects.filter(book__urlname__icontains="smith", user_id=user.id, rating__in=(1,2,3,4,5,6)).values('rating')
Documentation for more details.

How do I get the values of the lastest entries grouped by an attributes using Django ORM?

I have a report model looking a bit like this:
class Report(models.Model):
date = models.DateField()
quantity = models.IntegerField()
product_name = models.TextField()
I know I can get the last entry for the last year for one product this way:
Report.objects.filter(date__year=2009, product_name="corn").order_by("-date")[0]
I know I can group entries by name this way:
Report.objects.values("product_name")
But how can I get the quantity for the last entry for each product ? I feel like I would do it this way in SQL (not sure, my SQL is rusty):
SELECT product_name, quantity FROM report WHERE YEAR(date) == 2009 GROUP_BY product_name HAVING date == Max(date)
My guess is to use the Max() object with annotate, but I have no idea how to.
For now, I do it by manually adding the last item of each query for each product_name I cant list with a distinct.

Not exactly a trivial query using either the Django ORM or SQL. My first take on it would be to pretty much what you are probably already doing; get the distinct product and date pairs and then perform individual queries for each of those.
year_products = Product.objects.filter(year=2009)
product_date_pairs = year_products.values('product').distinct('product'
).annotate(Max('date'))
[Report.objects.get(product=p['product'], date=p['date__max'])
for p in product_date_pairs]
But you can take it a step further with the Q operator and some fancy OR'ing to trim your query count down to 2 instead of N + 1.
import operator
qs = [Q(product=p['product'], date=p['date__max']) for p in product_date_pairs]
ored_qs = reduce(operator.or_, qs)
Report.objects.filter(ored_qs)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Mutliple fields in the where clause of a QuerySet - django

Have you tried using this query OrderStatusHistory.objects.order_by('order', '-date').distinct('order')

Related

Django sum values of a column after 'group by' in another column

getting most recent by joining subquery in Django ORM

Django ORM: is it possible to inject subqueries?

What is the internal function in django to add new tables to a queryset in a sensible way?

How do I get the values of the lastest entries grouped by an attributes using Django ORM?

Categories

Resources