I have a table of log entries that have user_id and datetime. I'd like to make a query to fetch the most recent of each log entry by user_id. I can't seem to figure out how to do that...
The SQL for the query would be something like this:
SELECT *
FROM table a
JOIN (SELECT user_id, max(datetime) maxDate
FROM table
GROUP BY user_id) b
ON a.user_id = b.user_id AND a.datetime = b.maxDate
Right now I'm doing it with a raw query, but I'd like to use the ORM's methods.
I suppose this should do
Table.objects.order_by('-user_id').distinct('user_id')
See distinct() in this -> https://docs.djangoproject.com/en/1.9/ref/models/querysets/
But this will only work if the latest log entry for that user is the last entry by that user in the table, i.e., the log entries of a particular user as sorted in ascending way in the table.
You may try:
User.objects.latest().id
Table.objects.order_by('user_id', '-datetime').distinct('user_id')
Add indexes to user_id and datetime (Meta.index_together = ['user_id', 'datetime']).
Related
How to write the most efficient Django ORM query for the following scenario?
I need to get items based on a list of accountIds, but it will return duplicate records with the same accountId because accountId is not the primary key. Then I will need to remove the duplicates by only returning the last created record in the queryset.
I can use a for loop to loop through the list of accountIds and filter by each accountId and then order by the created date and get the latest one. However, with this approach, I will be calling the database so many times. There are more than 200 account Ids.
Are there better ways of doing this?
This could be useful
Model.objects.order_by('date_created').distinct()
docs: distinct in django queryset
if you are using postgres, it would be much useful and efficient
If using PostgreSQL you can add a field name to distinct() to create a SELECT DISTINCT ON (foo) query that returns the first unique value for that field. In your case if you order by account_id and then descending created_date you will get a single row per account_id that has the latest created_date
Item.objects.filter(
account_id__in=account_ids
).order_by(
'account_id', '-created_date'
).distinct('account_id')
I want to execute a simple query like:
select *,count('id') from menu_permission group by menu_id
In Django format I have tried:
MenuPermission.objects.all().values('menu_id').annotate(Count('id))
It selects only menu_id. The executed query is:
SELECT `menu_permission`.`menu_id`, COUNT(`menu_permission`.`id`) AS `id__count` FROM `menu_permission` GROUP BY `menu_permission`.`menu_id`
But I need other fields also. If I try:
MenuPermission.objects.all().values('id','menu_id').annotate(Count('id))
It adds 'id' in group by condition.
GROUP BY `menu_permission`.`id`
As a result I am not getting the expected result. How I can get all all fields in the output but group by a single one?
You can try subqueries to do what you need.
In my case I have two tables: Item and Transaction where item_id links to Item
First, I prepare Transaction subquery with group by item_id where I sum all amount fields and mark item_id as pk for outer query.
per_item_total=Transaction.objects.values('item_id').annotate(total=Sum('amount')).filter(item_id=OuterRef('pk'))
Then I select all rows from item plus subquery result as total filed.
items_with_total=Item.objects.annotate(total=Subquery(per_item_total.values('total')))
This produces the following SQL:
SELECT `item`.`id`, {all other item fields},
(SELECT SUM(U0.`amount`) AS `total` FROM `transaction` U0
WHERE U0.`item_id` = `item`.`id` GROUP BY U0.`item_id` ORDER BY NULL) AS `total` FROM `item`
You are trying to achieve this SQL:
select *, count('id') from menu_permission group by menu_id
But normally SQL requires that when a group by clause is used you only include those column names in the select that you are grouping by. This is not a django matter, but that's how SQL group by works.
The rows are grouped by those columns so those columns can be included in select and other columns can be aggregated if you want them to into a value. You can't include other columns directly as they may have more than one value (since the rows are grouped).
For example if you have a column called "permission_code", you could ask for an array of the values in the "permission_code" column when the rows are grouped by menu_id.
Depending on the SQL flavor you are using, this could be in PostgreSQL something like this:
select menu_id, array_agg(permission_code), count(id) from menu_permissions group by menu_id
Similary django queryset can be constructed for this.
Hopefully this helps, but if needed please share more about what you need to do and what your data models are.
The only way currently that it works as expected is to hve your query based on the model you want the GROUP BY to be based on.
In your case it looks like you have a Menu model (menu_id field foreign key) so doing this would give you what you want and will allow getting other aggregate information from your MenuPermission model but will only group by the Menu.id field:
Menu.objects.annotate(perm_count=Count('menupermission__id')).values('perm_count')
Of course there is no need for the "annotate" intermediate step if all you want is that single count.
query = MenuPermission.objects.values('menu_id').annotate(menu_id_count=Count('menu_id'))
You can check your SQL query by print(query.query)
This solution doesn't work, all fields end up in the group by clause, leaving it here because it may still be useful to someone.
model_fields = queryset.model._meta.get_fields()
queryset = queryset.values('menu_id') \
.annotate(
count=Count('id'),
**{field.name: F(field.name) for field in model_fields}
)
What i'm doing is getting the list of fields of our model, and set up a dictionary with the field name as key and an F instance with the field name as a parameter.
When unpacked (the **) it gets interpreted as named arguments passed into the annotate function.
For example, if we had a "name" field on our model, this annotate call would end up being equal to this:
queryset = queryset.values('menu_id') \
.annotate(
count=Count('id'),
name=F("name")
)
you can use the following code:
MenuPermission.objects.values('menu_id').annotate(Count('id)).values('field1', 'field2', 'field3'...)
I need to get queryset, which is similar to this in SQL:
select * from kraj
where kraj_id in (select kraj_id from klient_kraj where klient_id = 1)
As you can see, I work with klient_kraj model, which is filtered and 1 column is returned kraj_id, which is then used for another filtering.
I wasn't able to find way, how to obtain this queryset using ORM.
Thanks
In a e-shop application I have a model for storing history of order processing:
class OrderStatusHistory(models.Model):
order = models.ForeignKey(Order, related_name='status_history')
status = models.IntegerField()
date = models.DateTimeField(auto_now_add=True)
Now I want to get most recent status for each order. In pure SQL I can do this in a single query:
select * from order_orderstatushistory where (order_id, date) in (select order_id, max(date) from order_orderstatushistory group by order_id)
What is the best way do this in Django?
I have two options. The first is:
# Get the most recent status change date for each order
v = Order.objects.annotate(Max('status_history__date')).values_list('id', 'status_history__date__max')
# Get OrderStatusHistory objects
hist = OrderStatusHistory.objects.extra(where=['(order_id, date) IN %s'], params=[tuple(v)])
And the second:
hist = OrderStatusHistory.objects.extra(where=['(order_id, date) IN (select order_id, max(date) from order_orderstatushistory group by order_id)'])
The first option is pure Django, but results in 2 database queries and a large list of parameters passed from Django to database engine.
The second option requires to put SQL code directly into my application, but I'd like to avoid this.
Is there equivalent of where (order_id, date) in (select ...) in Django?
Have you tried using this query
OrderStatusHistory.objects.order_by('order', '-date').distinct('order')
In django 1.2:
I have a queryset with an extra parameter which refers to a table which is not currently included in the query django generates for this queryset.
If I add an order_by to the queryset which refers to the other table, django adds joins to the other table in the proper way and the extra works. But without the order_by, the extra parameter is failing. I could just add a useless secondary order_by to something in the other table, but I think there should be a better way to do it.
What is the django function to add joins in a sensible way? I know this must be getting called somewhere.
Here is some sample code. It selects all readings for a given user, and annotates the results with the rating (if any) given by another user stored in 'friend'.
class Book(models.Model):
name = models.CharField(max_length=200)
urlname = models.CharField(max_length=200)
entrydate=models.DateTimeField(auto_now_add=True)
class Reading(models.Model):
book=models.ForeignKey(Book,related_name='readings')
user=models.ForeignKey(User)
rating=models.IntegerField()
entrydate=models.DateTimeField(auto_now_add=True)
readings=Reading.objects.filter(user=user).order_by('entrydate')
friendrating='(select rating from proj_reading where user_id=%d and \
book_id=proj_book.id and rating in (1,2,3,4,5,6))'%friend.id
readings=readings.extra(select={'friendrating':friendrating})
at the moment, readings won't work because the join to readings is not set up correctly. however, if I add an order by such as:
.order_by('entrydate','reading__entrydate')
django magically knows to add an inner join through the foreign key and I get what I want.
additional information:
print readings.query ==>
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
assuming
user.id=1
friend.id=2
the error is:
OperationalError: Unknown column proj_book.id in 'where clause'
and it happens because the table proj_book is not included in the query. To restate what I said above - if I now do readings2=readings.order_by('book__entrydate') I can see the proper join is set up and the query works.
Ideally I'd just like to figure out what the name of the qs.query function is that looks at two tables and figures out how they are joined by foreign keys, and just call that manually.
Your generated query:
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
The db has no way to understand what does it mean by proj_book, since it is not included in (from tables or inner join).
You are getting expected results, when you add order_by, because that order_by query is adding inner join between proj_book and proj_reading.
As far as I understand, if you refer any other column in Book, not just order_by, you will get similar results.
Q1 = Reading.objects.filter(user=user).exclude(Book__name='') # Exclude forces to add JOIN
Q2 = "Select rating from proj_reading where user_id=%d" % user.id
Result = Q1.extra("foo":Q2)
This way, at step Q1, you are forcing DJango to add join on Book table, which is not default, unless you access any field of Book table.
you mean:
class SomeModel(models.Model)
id = models.IntegerField()
...
class SomeOtherModel(models.Model)
otherfield = models.ForeignKey(SomeModel)
qrst = SomeOtherModel.objects.filter(otherfield__id=1)
You can use "__" to create table joins.
EDIT:
It wont work because you do not define table join correctly.
myrating='(select rating from proj_reading inner join proj_book on (proj_book.id=proj_reading_id) where proj_reading.user_id=%d and rating in (1,2,3,4,5,6))'%user.id)'
This is a pesdocode and it is not tested.
But, i advice you to use django filters instead of writing sql queries.
read = Reading.objects.filter(book__urlname__icontains="smith", user_id=user.id, rating__in=(1,2,3,4,5,6)).values('rating')
Documentation for more details.