How to do SELECT COUNT(*) GROUP BY and ORDER BY in Django? - django

I'm using a transaction model to keep track all the events going through the system
class Transaction(models.Model):
actor = models.ForeignKey(User, related_name="actor")
acted = models.ForeignKey(User, related_name="acted", null=True, blank=True)
action_id = models.IntegerField()
......
how do I get the top 5 actors in my system?
In sql it will basically be
SELECT actor, COUNT(*) as total
FROM Transaction
GROUP BY actor
ORDER BY total DESC

According to the documentation, you should use:
from django.db.models import Count
Transaction.objects.all().values('actor').annotate(total=Count('actor')).order_by('total')
values() : specifies which columns are going to be used to "group by"
Django docs:
"When a values() clause is used to constrain the columns that are
returned in the result set, the method for evaluating annotations is
slightly different. Instead of returning an annotated result for each
result in the original QuerySet, the original results are grouped
according to the unique combinations of the fields specified in the
values() clause"
annotate() : specifies an operation over the grouped values
Django docs:
The second way to generate summary values is to generate an independent summary for each object in a QuerySet. For example, if you
are retrieving a list of books, you may want to know how many authors
contributed to each book. Each Book has a many-to-many relationship
with the Author; we want to summarize this relationship for each book
in the QuerySet.
Per-object summaries can be generated using the annotate() clause.
When an annotate() clause is specified, each object in the QuerySet
will be annotated with the specified values.
The order by clause is self explanatory.
To summarize: you group by, generating a queryset of authors, add the annotation (this will add an extra field to the returned values) and finally, you order them by this value
Refer to https://docs.djangoproject.com/en/dev/topics/db/aggregation/ for more insight
Good to note: if using Count, the value passed to Count does not affect the aggregation, just the name given to the final value. The aggregator groups by unique combinations of the values (as mentioned above), not by the value passed to Count. The following queries are the same:
Transaction.objects.all().values('actor').annotate(total=Count('actor')).order_by('total')
Transaction.objects.all().values('actor').annotate(total=Count('id')).order_by('total')

Just like #Alvaro has answered the Django's direct equivalent for GROUP BY statement:
SELECT actor, COUNT(*) AS total
FROM Transaction
GROUP BY actor
is through the use of values() and annotate() methods as follows:
Transaction.objects.values('actor').annotate(total=Count('actor')).order_by()
However one more thing must be pointed out:
If the model has a default ordering defined in class Meta, the .order_by() clause is obligatory for proper results. You just cannot skip it even when no ordering is intended.
Further, for a high quality code it is advised to always put a .order_by() clause after annotate(), even when there is no class Meta: ordering. Such approach will make the statement future-proof: it will work just as intended, regardless of any future changes to class Meta: ordering.
Let me provide you with an example. If the model had:
class Transaction(models.Model):
actor = models.ForeignKey(User, related_name="actor")
acted = models.ForeignKey(User, related_name="acted", null=True, blank=True)
action_id = models.IntegerField()
class Meta:
ordering = ['id']
Then such approach WOULDN'T work:
Transaction.objects.values('actor').annotate(total=Count('actor'))
That's because Django performs additional GROUP BY on every field in class Meta: ordering
If you would print the query:
>>> print Transaction.objects.values('actor').annotate(total=Count('actor')).query
SELECT "Transaction"."actor_id", COUNT("Transaction"."actor_id") AS "total"
FROM "Transaction"
GROUP BY "Transaction"."actor_id", "Transaction"."id"
It will be clear that the aggregation would NOT work as intended and therefore the .order_by() clause must be used to clear this behaviour and get proper aggregation results.
See: Interaction with default ordering or order_by() in official Django documentation.

If you want reverse (bigger value to smaller value) order just use - minus.
from django.db.models import Count
Transaction.objects.all().values('actor').annotate(total=Count('actor')).order_by('-total')

Related

Django doesn't respect Prefetch filters in annotate

class Subject(models.Model):
...
students = models.ManyToMany('Student')
type = models.CharField(max_length=100)
class Student(models.Model):
class = models.IntergerField()
dropped = models.BooleanField()
...
subjects_with_dropouts = (
Subject.objects.filter(category=Subject.STEM).
prefetch_related(
Prefetch('students', queryset=Students.objects.filter(class=2020))
.annotate(dropped_out=Case(
When(
students__dropped=True,
then=True,
),
output_field=BooleanField(),
default=False,
))
.filter(dropped_out=True)
)
I am trying to get all Subjects from category STEM, that have dropouts of class 2020, but for some reason I get Subjects that have dropouts from other classes as well.
I know that I can achive with
subjects_with_dropouts = Subject.objects.filter(
category=Subject.STEM,
students__dropped=True,
students__class=2020,
)
But why 1st approach doesn't work? I am using PostgreSQL.
When using prefetch, the joining is done in python. A good way to think of this is that you have two tables in the first query. One of subjects with at least one student who dropped out (note that you are doing an aggregate there (Case) so there is a JOIN with a GROUP BY on student.id), and one of students in class of 2020 (this is separate than the join in the first table). The prefetch just says to join these two separate queries using the through table that contains both of their ids representing a connection that is auto generated by ManyToManyField.
A good way to see what is actually happening is by using print(QuerySet.query) where QuerySet is the instance of the QuerySet (Subject.objects.all()). Or if you have the means, django debug toolbar is a fantastic tool that shows you the EXPLAIN statement of each query in each endpoint.

Annotate filtering -- sum only some of related objects' fields

Let's say there's an Author and he has Books. In order to fetch authors together with the number of written pages, the following can be done:
Author.objects.annotate(total_pages=Sum('book__pages'))
But what if I wanted to sum pages of sci-fi and fantasy books separately? I'd like to end up with an Author, that has total_pages_books_scifi_pages and total_pages_books_fantasy_pages properties.
I know I can do following:
Author.objects.filter(book__category='scifi').annotate(total_pages_books_scifi_pages=Sum('book__pages'))
Author.objects.filter(book__category='fantasy').annotate(total_pages_books_fantasy_pages=Sum('book__pages'))
But how do it in one queryset?
from django.db.models import IntegerField, F, Case, When, Sum
categories = ['scifi', 'fantasy']
annotations = {}
for category in categories:
annotation_name = 'total_pages_books_{}'.format(category)
case = Case(
When(book__category=category, then=F('book__pages')),
default=0,
output_field=IntegerField()
)
annotations[annotation_name] = Sum(case)
Author.objects.filter(
book__category__in=categories
).annotate(
**annotations
)
Try:
Author.objects.values("book__category").annotate(total_pages=Sum('book__pages'))
From Django docs:
https://docs.djangoproject.com/en/1.10/topics/db/aggregation/#values:
values()
Ordinarily, annotations are generated on a per-object basis - an annotated QuerySet will return one result for each object in the original QuerySet. However, when a values() clause is used to constrain the columns that are returned in the result set, the method for evaluating annotations is slightly different. Instead of returning an annotated result for each result in the original QuerySet, the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.

Django count shared manytomany between objects and count them

I have two models, called Article and Label. A simplified snippet of them is below:
class Label(models.Model):
name = models.CharField(null=True)
class Article(models.Model):
title = models.CharField(null=True)
labels = models.ManyToManyField(Label, related_name='pieces', blank=True)
When viewing a specific article, I would like to display articles which have similar labels to those applied to the article being viewed, ordered by the number of labels that are shared with the article being read (like "similar articles").
I am attempting to perform this operation in the DB but I am struggling to find a queryset which will give me the same functionality as what I have done in Python by pulling all the articles from DB and performing a for-loop on each of them. A non-functioning query attempt of what I am trying to do is below (viewed_article is the article object being viewed):
articles = Article.objects.all()\
.annotate(
tags_count=Article.objects.filter(F('viewed_article.labels')
).count()).order_by(tags_count)
You need to use conditional expressions and a somewhat complicated query to achieve this:
from django.db.models import Case, Count, IntegerField, Sum, When
current_labels = viewed_article.labels.all()
similar_articles = Article.objects.filter(labels__in=current_labels).distinct()\
.annotate(
tag_count=Sum(
Case(
When(labels__in=current_labels, then=1),
default=0, output_field=IntegerField()
)
)
).order_by('-tag_count')
What is happening is:
Fetch all articles that share any labels with the current one. distinct() is required to weed out duplicates returned by the underlying JOIN query.
Annotate each article with conditional expression. Here it checks the article has each of the current article's labels, and adds 1 to the sum if it does. The result is a count of matching labels.
Order results by the count of matching labels.

Many-to-many "by proxy" relathionship in django

My data model consists of three main entities:
class User(models.Model):
...
class Source(models.Model):
user = models.ForeignKey(User, related_name='iuser')
country = models.ForeignKey(Country, on_delete=models.DO_NOTHING)
description = models.CharField(max_length=100)
class Destination(models.Model):
user = models.ForeignKey(User, related_name='wuser')
country = models.ForeignKey(Country)
I am trying to create a queryset which is join all sources with destinations by user (many to many). In such a way I would have a table with all possible source/destination combinations for every user.
In SQL I would simple JOIN the three tables and select the appropriate information from each table.
My question is how to perform the query? How to access the query data?
In django queries are done on the model object, its well documented. The queries or querysets are lazy and when they execute they generally return a list of dict, each dict in the list contains the field followed by the value eg: [{'user':'albert','country':'US and A :) ','description':'my description'},....].
All possible source,destination combinations for every user?
I think you will have to use a reverse relation ship to get this done eg:
my_joined_query = User.objects.values('user','source__country','source__description','destination__country')
notice that i'm using the smaller case name of the models Source and Destination which have ForeignKey relationship with User this will join all the three tabels go through the documentation its rich.
Edit:
To make an inner join you will have to tell the query, this can be simply achieved by using __isnull=False on the reverse model name:
my_innerjoined_query = User.objects.filter(source__isnull=False,destination__isnull=False)
This should do a inner join on all the tables.
Then you can select what you want to display by using values as earlier.
hope that helps. :)

How can i get a list of objects from a postgresql view table to display

this is a model of the view table.
class QryDescChar(models.Model):
iid_id = models.IntegerField()
cid_id = models.IntegerField()
cs = models.CharField(max_length=10)
cid = models.IntegerField()
charname = models.CharField(max_length=50)
class Meta:
db_table = u'qry_desc_char'
this is the SQL i use to create the table
CREATE VIEW qry_desc_char as
SELECT
tbl_desc.iid_id,
tbl_desc.cid_id,
tbl_desc.cs,
tbl_char.cid,
tbl_char.charname
FROM tbl_desC,tbl_char
WHERE tbl_desc.cid_id = tbl_char.cid;
i dont know if i need a function in models or views or both. i want to get a list of objects from that database to display it. This might be easy but im new at Django and python so i having some problems
Django 1.1 brought in a new feature that you might find useful. You should be able to do something like:
class QryDescChar(models.Model):
iid_id = models.IntegerField()
cid_id = models.IntegerField()
cs = models.CharField(max_length=10)
cid = models.IntegerField()
charname = models.CharField(max_length=50)
class Meta:
db_table = u'qry_desc_char'
managed = False
The documentation for the managed Meta class option is here. A relevant quote:
If False, no database table creation
or deletion operations will be
performed for this model. This is
useful if the model represents an
existing table or a database view that
has been created by some other means.
This is the only difference when
managed is False. All other aspects of
model handling are exactly the same as
normal.
Once that is done, you should be able to use your model normally. To get a list of objects you'd do something like:
qry_desc_char_list = QryDescChar.objects.all()
To actually get the list into your template you might want to look at generic views, specifically the object_list view.
If your RDBMS lets you create writable views and the view you create has the exact structure than the table Django would create I guess that should work directly.
(This is an old question, but is an area that still trips people up and is still highly relevant to anyone using Django with a pre-existing, normalized schema.)
In your SELECT statement you will need to add a numeric "id" because Django expects one, even on an unmanaged model. You can use the row_number() window function to accomplish this if there isn't a guaranteed unique integer value on the row somewhere (and with views this is often the case).
In this case I'm using an ORDER BY clause with the window function, but you can do anything that's valid, and while you're at it you may as well use a clause that's useful to you in some way. Just make sure you do not try to use Django ORM dot references to relations because they look for the "id" column by default, and yours are fake.
Additionally I would consider renaming my output columns to something more meaningful if you're going to use it within an object. With those changes in place the query would look more like (of course, substitute your own terms for the "AS" clauses):
CREATE VIEW qry_desc_char as
SELECT
row_number() OVER (ORDER BY tbl_char.cid) AS id,
tbl_desc.iid_id AS iid_id,
tbl_desc.cid_id AS cid_id,
tbl_desc.cs AS a_better_name,
tbl_char.cid AS something_descriptive,
tbl_char.charname AS name
FROM tbl_desc,tbl_char
WHERE tbl_desc.cid_id = tbl_char.cid;
Once that is done, in Django your model could look like this:
class QryDescChar(models.Model):
iid_id = models.ForeignKey('WhateverIidIs', related_name='+',
db_column='iid_id', on_delete=models.DO_NOTHING)
cid_id = models.ForeignKey('WhateverCidIs', related_name='+',
db_column='cid_id', on_delete=models.DO_NOTHING)
a_better_name = models.CharField(max_length=10)
something_descriptive = models.IntegerField()
name = models.CharField(max_length=50)
class Meta:
managed = False
db_table = 'qry_desc_char'
You don't need the "_id" part on the end of the id column names, because you can declare the column name on the Django model with something more descriptive using the "db_column" argument as I did above (but here I only it to prevent Django from adding another "_id" to the end of cid_id and iid_id -- which added zero semantic value to your code). Also, note the "on_delete" argument. Django does its own thing when it comes to cascading deletes, and on an interesting data model you don't want this -- and when it comes to views you'll just get an error and an aborted transaction. Prior to Django 1.5 you have to patch it to make DO_NOTHING actually mean "do nothing" -- otherwise it will still try to (needlessly) query and collect all related objects before going through its delete cycle, and the query will fail, halting the entire operation.
Incidentally, I wrote an in-depth explanation of how to do this just the other day.
You are trying to fetch records from a view. This is not correct as a view does not map to a model, a table maps to a model.
You should use Django ORM to fetch QryDescChar objects. Please note that Django ORM will fetch them directly from the table. You can consult Django docs for extra() and select_related() methods which will allow you to fetch related data (data you want to get from the other table) in different ways.