Ordering entries via comment count with django - django

I need to get entries from database with counts of comments. Can i do it with django's comment framework? I am also using a voting application which is not using GenericForeignKeys i get entries with scores like this:
class EntryManager(models.ModelManager):
def get_queryset(self):
return super(EntryManager,self).get_queryset(self).all().annotate(\
score=Sum("linkvote__value"))
But when there is foreignkeys i am being stuck. Do you have any ideas about that?
extra explaination: i need to fetch entries like this:
id | body | vote_score | comment_score |
1 | foo | 13 | 4 |
2 | bar | 4 | 1 |
after doing that, i can order them via comment_score. :)
Thans for all replies.

Apparently, annotating with reverse generic relations (or extra filters, in general) is still an open ticket (see also the corresponding documentation). Until this is resolved, I would suggest using raw SQL in an extra query, like this:
return super(EntryManager,self).get_queryset(self).all().annotate(\
vote_score=Sum("linkvote__value")).extra(select={
'comment_score': """SELECT COUNT(*) FROM comments_comment
WHERE comments_comment.object_pk = yourapp_entry.id
AND comments_comment.content_type = %s"""
}, select_params=(entry_type,))
Of course, you have to fill in the correct table names. Furthermore, entry_type is a "constant" that can be set outside your lookup function (see ContentTypeManager):
from django.contrib.contenttypes.models import ContentType
entry_type = ContentType.objects.get_for_model(Entry)
This is assuming you have a single model Entry that you want to calculate your scores on. Otherwise, things would get slightly more complicated: you would need a sub-query to fetch the content type id for the type of each annotated object.

Related

Order of Django Queryset results

I'm having trouble understanding why a Queryset is being returned in the order it is. We have authors listed for articles and those get stored in a ManyToMany table called: articlepage_authors
We need to be able to pick, on an article by article basis, what order they are returned and displayed in.
For example, article with id 44918 has authors 13752 (‘Lee Bodding’) and 13751 (‘Mark Lee’).
I called these in the shell which returns :
Out[6]: <QuerySet [<User: Mark Lee (MarkLee#uss778.net)>, <User: Lee Bodding (LeeBodding#uss778.net)>]>
Calling this in postgres: SELECT * FROM articlepage_authors;
shows that user Lee Bodding id=13752 is stored first in the table.
id | articlepage_id | user_id
-----+----------------+---------
1 | 44508 | 7781
2 | 44508 | 7775
3 | 44514 | 17240
….
465 | 44916 | 17171
468 | 44918 | 13752
469 | 44918 | 13751
No matter what I try e.g. deleting the authors, adding ‘Lee Bodding’, saving the article, then adding ‘Mark Lee’, and vice versa – I can still only get a query set which returns ‘Mark Lee’ first.
I am not sure how else to debug this.
One solution would be to add another field which defines the order of authors, but I’d like to understand what’s going on here first. Something seems to be defining the order already, and it’d be better to manage that.
You can add an order_by to your queryset to make records appear in the order that you would like. Warning: for query optimization you may need to create an index on that field for performance reasons depending on the database:
By default, results returned by a QuerySet are ordered by the ordering tuple given by the ordering option in the model’s Meta. You can override this on a per-QuerySet basis by using the order_by method.
Example:
Entry.objects.filter(pub_date__year=2005).order_by('-pub_date', 'headline')
The result above will be ordered by pub_date descending, then by headline ascending. The negative sign in front of "-pub_date" indicates descending order. Ascending order is implied.
You pair that with an extra to order by the many-to-many ID:
.extra(select={
'creation_seq': 'articlepage_authors.id'
}).order_by("creation_seq")
If you're using django > 1.10, you can just use the field directly without the extra:
.order_by('articlepage_authors.id')

Django ORM: django aggregate over filtered reverse relation

The question is remotely related to Django ORM: filter primary model based on chronological fields from related model, by further limiting the resulting queryset.
The models
Assuming we have the following models:
class Patient(models.Model)
name = models.CharField()
# other fields following
class MedicalFile(model.Model)
patient = models.ForeignKey(Patient, related_name='files')
issuing_date = models.DateField()
expiring_date = models.DateField()
diagnostic = models.CharField()
The query
I need to select all the files which are valid at a specified date, most likely from the past. The problem that I have here is that for every patient, there will be a small overlapping period where a patient will have 2 valid files. If we're querying for a date from that small timeframe, I need to select only the most recent file.
More to the point: consider patient John Doe. he will have string of "uninterrupted" files starting with 2012 like this:
+---+------------+-------------+
|ID |issuing_date|expiring_date|
+---+------------+-------------+
|1 |2012-03-06 |2013-03-06 |
+---+------------+-------------+
|2 |2013-03-04 |2014-03-04 |
+---+------------+-------------+
|3 |2014-03-04 |2015-03-04 |
+---+------------+-------------+
As one can easily observe, there is an overlap of couple of days of the validity of these files. For instance, in 2013-03-05 the files 1 and 2 are valid, but we're considering only file 2 (as the most recent one). I'm guessing that the use case isn't special: this is the case of managing subscriptions, where in order to have a continuous subscription, you will renew your subscription earlier.
Now, in my application I need to query historical data, e.g. give me all the files which where valid at 2013-03-05, considering only the "most recent" ones. I was able to solve this by using RawSQL, but I would like to have a solution without raw SQL. In the previous question, we were able to filter the "latest" file by aggregation over the reverse relation, something like:
qs = MedicalFile.objects.annotate(latest_file_date=Max('patient__files__issuing_date'))
qs = qs.filter(issuing_date=F('latest_file_date')).select_related('patient')
The problem is that we need to limit the range over which latest_file_date is computed, by filtering against 2013-03-05. But aggregate function don't run over filtered querysets ...
The "poor" solution
I'm currently doing this via an extra queryset clause (substitute "app" with your concrete application):
reference_date = datetime.date(year=2013, month=3, day=5)
annotation_latest_issuing_date = {
'latest_issuing_date': RawSQL('SELECT max(file.issuing_date) '
'FROM <app>_medicalfile file '
'WHERE file.person_id = <app>_medicalfile.person_id '
' AND file.issuing_date <= %s', (reference_date, ))
}
qs = MedicalFile.objects.filter(expiring_date__gt=reference_date, issuing_date__lte=reference_date)
qs = qs.extra(**annotation_latest_issuing_date).filter(issuing_date=F('latest_issuing_date'))
Writen as such, the queryset returns correct number of records.
Question: how can it be achieved without RaWSQL and (already implied) with the same performance level ?
You can use id__in and provide your nested filtered queryset (like all files that are valid at the given date).
qs = MedicalFile.objects
.filter(id__in=self.filter(expiring_date__gt=reference_date, issuing_date__lte=reference_date))
.order_by('patient__pk', '-issuing_date')
.distinct('patient__pk') # field_name parameter only supported by Postgres
The order_by groups the files by patient, with the latest issuing date first. distinct then retrieves that first file for each patient. However, general care is required when combining order_by and distinct: https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.distinct
Edit: Removed single patient dependence from first filter and changed latest to combination of order_by and distinct
Consider p is a Patient class instance.
I think you can do someting like:
p.files.filter(issue_date__lt='some_date', expiring_date__gt='some_date')
See https://docs.djangoproject.com/en/1.9/topics/db/queries/#backwards-related-objects
Or maybe with the Q magic query object...

Django: duplicates when filtering on many to many field

I've got the following models in my Django app:
class Book(models.Model):
name = models.CharField(max_length=100)
keywords = models.ManyToManyField('Keyword')
class Keyword(models.Model)
name = models.CharField(max_length=100)
I've got the following keywords saved:
science-fiction
fiction
history
science
astronomy
On my site a user can filter books by keyword, by visiting /keyword-slug/. The keyword_slug variable is passed to a function in my views, which filters Books by keyword as follows:
def get_books_by_keyword(keyword_slug):
books = Book.objects.all()
keywords = keyword_slug.split('-')
for k in keywords:
books = books.filter(keywords__name__icontains=k)
This works for the most part, however whenever I filter with a keyword that contains a string that appears more than once in the keywords table (e.g. science-fiction and fiction), then I get the same book appear more than once in the resulting QuerySet.
I know I can add distinct to only return unique books, but I'm wondering why I'm getting duplicates to begin with, and really want to understand why this works the way it does. Since I'm only calling filter() on successfully filtered QuerySets, how does the duplicate book get added to the results?
The 2 models in your example are represented with 3 tables: book, keyword and book_keyword relation table to manage M2M field.
When you use keywords__name in filter call Django is using SQL JOIN to merge all 3 tables. This allows you to filter objects in 1st table by values from another table.
The SQL will be like this:
SELECT `book`.`id`,
`book`.`name`
FROM `book`
INNER JOIN `book_keyword` ON (`book`.`id` = `book_keyword`.`book_id`)
INNER JOIN `keyword` ON (`book_keyword`.`keyword_id` = `keyword`.`id`)
WHERE (`keyword`.`name` LIKE %fiction%)
After JOIN your data looks like
| Book Table | Relation table | Keyword table |
|---------------------|------------------------------------|------------------------------|
| Book ID | Book name | relation_book_id | relation_key_id | Keyword ID | Keyword name |
|---------|-----------|------------------|-----------------|------------|-----------------|
| 1 | Book 1 | 1 | 1 | 1 | Science-fiction |
| 1 | Book 1 | 1 | 2 | 2 | Fiction |
| 2 | Book 2 | 2 | 2 | 2 | Fiction |
Then when data is loaded from DB into Python you only receive data from book table. As you can see the Book 1 is duplicated there
This is how Many-to-many relation and JOIN works
Direct quote from the Docs: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Successive filter() calls further restrict the
set of objects, but for multi-valued relations, they apply to any
object linked to the primary model, not necessarily those objects that
were selected by an earlier filter() call.
In your case, because keywords is a multi-valued relation, your chain of .filter() calls filters based only on the original model and not on the previous queryset.

Need Modeling Help For An Ordering Form

I'd like to create a Django project for my company's purchasing department. This would be my first project in Django, so sorry if this comes off as rudimentary. The workflow would look something like this:
user registers for an account > signs in > can create, edit, view, or delete a purchase order.
I'm getting tripped up on the modeling. Presumably I can create and authenticate users using django.contrib.auth. Also, since this is mainly a form saving/printing application I would use a ModelForm to generate my forms based on my models since the users will be making changes to the form data that will need to be saved. A simplified version of the purchase order form in question looks something like this:
| Vendor | Date | Lead Time | Arrival Date | Buyer_Name |
+--------+-------+-----------+--------------+------------+
| FooBar |1-1-12 | 30 | 2-1-12 | Mr. Bar |
+--------+-------+-----------+--------------+------------+
+--------+-------+-----------+--------------+------------+
| SKU | Description | Quantity | Price | Dimensions |
+--------+-------------+----------+-------+--------------+
|12345 | Soft Bar | 38 | 5.75 | 16 X 5 X 8 |
+--------+-------------+----------+-------+--------------+
|12346 | Hard Bar | 12 | 5.75 | 16 X 5 X 8 |
+--------+-------------+----------+-------+--------------+
|12347 | Medium Bar | 17 | 5.75 | 16 X 5 X 8 |
+--------+-------------+----------+-------+--------------+
As you can see, the main purchase order form has a header that identifies the Vendor being ordered from, the current date, lead time, arrival date, and the buyer's name who is filling the form out. Under that is a line-by-line order detail for three different SKUs. Ideally, each PurchaseOrder should be able to have many SKUs added to it.
What is the best way to model something like this? Do I create a User, PurchaseOrder, and SKU model? Then add a FK to the SKU Model that points to the PurchaseOrder Model's PK or is there some other, more correct, way to do something like this? Thanks in advance for any help.
[Edit]
Django had what I was looking for all along. Since this is essentially a nested form, I could make use of Formsets.
Here are two helpful links to get started:
https://docs.djangoproject.com/en/1.4/topics/forms/formsets/
https://docs.djangoproject.com/en/1.4/topics/forms/modelforms/#model-formsets
Use django's built in user model (you can look at the source to see the definition but it is similar to the code below for these other models). Other than that I would suggest a model for every object you mentioned.
Don't add a FK to the SKU Model since SKU can exist without being in a purchase order (if I understand the problem correctly).
models.py
from django.contrib.auth.models import User
class Vendor(models.Model):
name = models.CharField(max_length=200)
#other fields
class SKU(models.Model):
description = models.CharField(max_length=200)
#other fields
class PurchaseOrder(models.Model):
purchaser = models.ForiegnKey(User)
name = models.CharField(max_length=200)
skus = models.ManyToManyField(SKU) #this is the magic that allows 1 purchase order to be filled with several SKUs
#other fields

Django - ORM question

Just wondering if it is possible to get a result that I can get using this SQL query with only Django ORM:
SELECT * FROM (SELECT DATE_FORMAT(created, "%Y") as dte, sum(1) FROM some_table GROUP BY dte) as analytics;
The result is:
+------+--------+
| dte | sum(1) |
+------+--------+
| 2006 | 20 |
| 2007 | 2230 |
| 2008 | 4929 |
| 2009 | 1177 |
+------+--------+
The simplified model looks like this:
# some/models.py
class Table(models.Model):
created = models.DateTimeField(default=datetime.datetime.now)
I've tried various ways using mix of .extra(select={}) and .values() and also using the .query.group_by trick described here but would appreciate a fresh eyes on the problem.
Django 1.1 (trunk at the time of posting this) has aggregates, these allow you to perform counts, mins, sums, averages, etc. in your queries.
What you're looking to do would probably be accomplished using multiple querysets. Remember, each row in a table (even a generated results table) is supposed to be a new object. You don't really explain what you're summing so I'll consider it dollars:
book_years = Books.object.all().order_by('year').distinct()
# I use a list comprehension to filter out just the years
for year in [book_year.created.year for book_year in book_years]:
sum_for_year = Book.objects.filter(created__year=year).aggregate(Sum(sales))
When you need a query that Django doesn't let you express through the ORM, you can always use raw SQL.
For your immediate purpose, I'm thinking that grouping by an expression (and doing an aggregate calculation on the group) is beyond Django's current capabilities.