Django - ORM question - django

Just wondering if it is possible to get a result that I can get using this SQL query with only Django ORM:
SELECT * FROM (SELECT DATE_FORMAT(created, "%Y") as dte, sum(1) FROM some_table GROUP BY dte) as analytics;
The result is:
+------+--------+
| dte | sum(1) |
+------+--------+
| 2006 | 20 |
| 2007 | 2230 |
| 2008 | 4929 |
| 2009 | 1177 |
+------+--------+
The simplified model looks like this:
# some/models.py
class Table(models.Model):
created = models.DateTimeField(default=datetime.datetime.now)
I've tried various ways using mix of .extra(select={}) and .values() and also using the .query.group_by trick described here but would appreciate a fresh eyes on the problem.

Django 1.1 (trunk at the time of posting this) has aggregates, these allow you to perform counts, mins, sums, averages, etc. in your queries.
What you're looking to do would probably be accomplished using multiple querysets. Remember, each row in a table (even a generated results table) is supposed to be a new object. You don't really explain what you're summing so I'll consider it dollars:
book_years = Books.object.all().order_by('year').distinct()
# I use a list comprehension to filter out just the years
for year in [book_year.created.year for book_year in book_years]:
sum_for_year = Book.objects.filter(created__year=year).aggregate(Sum(sales))

When you need a query that Django doesn't let you express through the ORM, you can always use raw SQL.
For your immediate purpose, I'm thinking that grouping by an expression (and doing an aggregate calculation on the group) is beyond Django's current capabilities.

Related

Order of Django Queryset results

I'm having trouble understanding why a Queryset is being returned in the order it is. We have authors listed for articles and those get stored in a ManyToMany table called: articlepage_authors
We need to be able to pick, on an article by article basis, what order they are returned and displayed in.
For example, article with id 44918 has authors 13752 (‘Lee Bodding’) and 13751 (‘Mark Lee’).
I called these in the shell which returns :
Out[6]: <QuerySet [<User: Mark Lee (MarkLee#uss778.net)>, <User: Lee Bodding (LeeBodding#uss778.net)>]>
Calling this in postgres: SELECT * FROM articlepage_authors;
shows that user Lee Bodding id=13752 is stored first in the table.
id | articlepage_id | user_id
-----+----------------+---------
1 | 44508 | 7781
2 | 44508 | 7775
3 | 44514 | 17240
….
465 | 44916 | 17171
468 | 44918 | 13752
469 | 44918 | 13751
No matter what I try e.g. deleting the authors, adding ‘Lee Bodding’, saving the article, then adding ‘Mark Lee’, and vice versa – I can still only get a query set which returns ‘Mark Lee’ first.
I am not sure how else to debug this.
One solution would be to add another field which defines the order of authors, but I’d like to understand what’s going on here first. Something seems to be defining the order already, and it’d be better to manage that.
You can add an order_by to your queryset to make records appear in the order that you would like. Warning: for query optimization you may need to create an index on that field for performance reasons depending on the database:
By default, results returned by a QuerySet are ordered by the ordering tuple given by the ordering option in the model’s Meta. You can override this on a per-QuerySet basis by using the order_by method.
Example:
Entry.objects.filter(pub_date__year=2005).order_by('-pub_date', 'headline')
The result above will be ordered by pub_date descending, then by headline ascending. The negative sign in front of "-pub_date" indicates descending order. Ascending order is implied.
You pair that with an extra to order by the many-to-many ID:
.extra(select={
'creation_seq': 'articlepage_authors.id'
}).order_by("creation_seq")
If you're using django > 1.10, you can just use the field directly without the extra:
.order_by('articlepage_authors.id')

Query excluding duplicates in Django

I'm using distinct() QuerySet to get some data in Django.
My initial query was Point.objects.order_by('chron', 'pubdate').
The field chron in some cases is a duplicate so I changed the query
to Point.objects.order_by('chron', 'pubdate').distinct('chron') in order to exclude duplicates.
Now the problem is that all empty fields are considered duplicates.
To be accurate, the chron field contain integers (which behave similar to ids), in some cases it can be a duplicate, in some cases it can be NULL.
| chron |
|-------|
| 1 | I want this
| 2 | I want this
| 3 | I want this
| 3 |
| NULL |
| 4 | I want this
| NULL |
I want to exclude all the chron duplicates but not if they are duplicate of NULL.
Thank you.
Use two separate queries.
.distinct("chron").exclude(chron__isnull=True)
.filter() for only chron values where chron__isnull=True.
Although this seems pretty inefficient I believe (I will happily be corrected) that even any sensible vanilla SQL statement (eg. below) would require multiple table scans to join a result set of nulls and unique values.
SELECT *
FROM (
SELECT chron
FROM Point
WHERE chron IS NOT NULL # .exclude()
GROUP BY chron # .distinct()
UNION ALL
SELECT chron
FROM Point
WHERE chron IS NULL # .include()
)

Django: duplicates when filtering on many to many field

I've got the following models in my Django app:
class Book(models.Model):
name = models.CharField(max_length=100)
keywords = models.ManyToManyField('Keyword')
class Keyword(models.Model)
name = models.CharField(max_length=100)
I've got the following keywords saved:
science-fiction
fiction
history
science
astronomy
On my site a user can filter books by keyword, by visiting /keyword-slug/. The keyword_slug variable is passed to a function in my views, which filters Books by keyword as follows:
def get_books_by_keyword(keyword_slug):
books = Book.objects.all()
keywords = keyword_slug.split('-')
for k in keywords:
books = books.filter(keywords__name__icontains=k)
This works for the most part, however whenever I filter with a keyword that contains a string that appears more than once in the keywords table (e.g. science-fiction and fiction), then I get the same book appear more than once in the resulting QuerySet.
I know I can add distinct to only return unique books, but I'm wondering why I'm getting duplicates to begin with, and really want to understand why this works the way it does. Since I'm only calling filter() on successfully filtered QuerySets, how does the duplicate book get added to the results?
The 2 models in your example are represented with 3 tables: book, keyword and book_keyword relation table to manage M2M field.
When you use keywords__name in filter call Django is using SQL JOIN to merge all 3 tables. This allows you to filter objects in 1st table by values from another table.
The SQL will be like this:
SELECT `book`.`id`,
`book`.`name`
FROM `book`
INNER JOIN `book_keyword` ON (`book`.`id` = `book_keyword`.`book_id`)
INNER JOIN `keyword` ON (`book_keyword`.`keyword_id` = `keyword`.`id`)
WHERE (`keyword`.`name` LIKE %fiction%)
After JOIN your data looks like
| Book Table | Relation table | Keyword table |
|---------------------|------------------------------------|------------------------------|
| Book ID | Book name | relation_book_id | relation_key_id | Keyword ID | Keyword name |
|---------|-----------|------------------|-----------------|------------|-----------------|
| 1 | Book 1 | 1 | 1 | 1 | Science-fiction |
| 1 | Book 1 | 1 | 2 | 2 | Fiction |
| 2 | Book 2 | 2 | 2 | 2 | Fiction |
Then when data is loaded from DB into Python you only receive data from book table. As you can see the Book 1 is duplicated there
This is how Many-to-many relation and JOIN works
Direct quote from the Docs: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Successive filter() calls further restrict the
set of objects, but for multi-valued relations, they apply to any
object linked to the primary model, not necessarily those objects that
were selected by an earlier filter() call.
In your case, because keywords is a multi-valued relation, your chain of .filter() calls filters based only on the original model and not on the previous queryset.

How to group by AND aggregate with Django

I have a fairly simple query I'd like to make via the ORM, but can't figure that out..
I have three models:
Location (a place), Attribute (an attribute a place might have), and Rating (a M2M 'through' model that also contains a score field)
I want to pick some important attributes and be able to rank my locations by those attributes - i.e. higher total score over all selected attributes = better.
I can use the following SQL to get what I want:
select location_id, sum(score)
from locations_rating
where attribute_id in (1,2,3)
group by location_id order by sum desc;
which returns
location_id | sum
-------------+-----
21 | 12
3 | 11
The closest I can get with the ORM is:
Rating.objects.filter(
attribute__in=attributes).annotate(
acount=Count('location')).aggregate(Sum('score'))
Which returns
{'score__sum': 23}
i.e. the sum of all, not grouped by location.
Any way around this? I could execute the SQL manually, but would rather go via the ORM to keep things consistent.
Thanks
Try this:
Rating.objects.filter(attribute__in=attributes) \
.values('location') \
.annotate(score = Sum('score')) \
.order_by('-score')
Can you try this.
Rating.objects.values('location_id').filter(attribute__in=attributes).annotate(sum_score=Sum('score')).order_by('-score')

Ordering entries via comment count with django

I need to get entries from database with counts of comments. Can i do it with django's comment framework? I am also using a voting application which is not using GenericForeignKeys i get entries with scores like this:
class EntryManager(models.ModelManager):
def get_queryset(self):
return super(EntryManager,self).get_queryset(self).all().annotate(\
score=Sum("linkvote__value"))
But when there is foreignkeys i am being stuck. Do you have any ideas about that?
extra explaination: i need to fetch entries like this:
id | body | vote_score | comment_score |
1 | foo | 13 | 4 |
2 | bar | 4 | 1 |
after doing that, i can order them via comment_score. :)
Thans for all replies.
Apparently, annotating with reverse generic relations (or extra filters, in general) is still an open ticket (see also the corresponding documentation). Until this is resolved, I would suggest using raw SQL in an extra query, like this:
return super(EntryManager,self).get_queryset(self).all().annotate(\
vote_score=Sum("linkvote__value")).extra(select={
'comment_score': """SELECT COUNT(*) FROM comments_comment
WHERE comments_comment.object_pk = yourapp_entry.id
AND comments_comment.content_type = %s"""
}, select_params=(entry_type,))
Of course, you have to fill in the correct table names. Furthermore, entry_type is a "constant" that can be set outside your lookup function (see ContentTypeManager):
from django.contrib.contenttypes.models import ContentType
entry_type = ContentType.objects.get_for_model(Entry)
This is assuming you have a single model Entry that you want to calculate your scores on. Otherwise, things would get slightly more complicated: you would need a sub-query to fetch the content type id for the type of each annotated object.