I'm trying to get the ten most commented posts in my django app, but I'm unable to do it because I can't think a proper way.
I'm currently using the django comments framework, and I've seen a possibility of doing this with aggregate or annotate , but I can figure out how.
The thing would be:
Get all the posts
Calculate the number of comments per post (I have a comment_count method for that)
Order the posts from most commented to less
Get the first 10 (for example)
Is there any "simple" or "pythonic" way to do this? I'm a bit lost since the comments framework is only accesible via template tags, and not directly from the code (unless you want to modify it)
Any help is appreciated
You're right that you need to use the annotation and aggregation features. What you need to do is group by and get a count of the object_pk of the Comment model:
from django.contrib.comments.models import Comment
from django.db.models import Count
o_list = Comment.objects.values('object_pk').annotate(ocount=Count('object_pk'))
This will assign something like the following to o_list:
[{'object_pk': '123', 'ocount': 56},
{'object_pk': '321', 'ocount': 47},
...etc...]
You could then sort the list and slice the top 10:
top_ten_objects = sorted(o_list, key=lambda k: k['ocount'])[:10]
You can then use the values in object_pk to retrieve the objects that the comments are attached to.
Annotate is going to be the preferred way, partially because it will reduce db queries and it's basically a one-liner. While your theoretical loop would work, I bet your comment_count method relies on querying comments for a given post, which would be 1 query per post that you loop over- nasty!
posts_by_score = Comment.objects.filter(is_public=True).values('object_pk').annotate(
score=Count('id')).order_by('-score')
post_ids = [int(obj['object_pk']) for obj in posts_by_score]
top_posts = Post.objects.in_bulk(post_ids)
This code is shameless adapted from Django-Blog-Zinnia (no affiliation)
Related
I have the following models:
class Work(models.Model):
visible = models.BooleanField(default=False)
class Book(models.Model):
work = models.ForeignKey('Work')
I am attempting to update some rows like so:
qs=Work.objects.all()
qs.annotate(Count('book')).filter(Q(book__count__gt=1)).update(visible=False)
However, this is giving an error:
DatabaseError: subquery has too many columns
LINE 1: ...SET "visible" = false WHERE "app_work"."id" IN (SELECT...
If I remove the update clause, the query runs with no problems and returns what I am expecting.
It looks like this error happens for queries with an annotate followed by an update. Is there some other way to write this?
Without making a toy database to be able to duplicate your issue and try out solutions, I can at least suggest the approach in Django: Getting complement of queryset as one possible approach.
Try this approach:
qs.annotate(Count('book')).filter(Q(book__count__gt=1))
Work.objects.filter(pk__in=qs.values_list('pk', flat=True)).update(visible=False)
You can also clear the annotations off a queryset quite simply:
qs.query.annotations.clear()
qs.update(..)
And this means you're only firing off one query, not one into another, but don't use this if your query relies on an annotation to filter. This is great for stripping out database-generated concatenations, and the utility rubbish that I occasionally add into model's default queries... but the example in the question is a perfect example of where this would not work.
To add to Oli's answer: If you need your annotations for the update then do the filters first and store the result in a variable and then call filter with no arguments on that queryset to access the update function like so:
q = X.objects.filter(annotated_val=5, annotated_name='Nima')
q.query.annotations.clear()
q.filter().update(field=900)
I've duplicated this issue & believe its a bug with the Django ORM. #acjay answer is a good workaround. Bug report: https://code.djangoproject.com/ticket/25171
Fix released in Django 2 alpha: https://code.djangoproject.com/ticket/19513
I have this query:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)
Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()
But that doesn't work, I get back an empty Array. Any ideas why?
Here is my CheckinAct model:
class CheckinAct(models.Model):
user = models.ForeignKey(User)
location = models.ForeignKey(Location)
time = models.DateTimeField()
----Update------
So now I have updated my query to look like this:
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
But I'm still getting multiple objects back that have the same user, like so:
[{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
---- Update 2------
Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.
checkins = CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user')).values('user', 'dcount')
logger.info("checkins!!! : " + str(checkins))
Logs the following:
checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.
The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)
For your use-case, the following should give you exactly what you want:
checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
values('user').annotate(checkin_count=Count('pk')).order_by()
UPDATE
Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):
User.objects.filter(checkinat__location=location).distinct().count()
UPDATE for non-rel support
checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))
This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.
You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.
What you want to do to get your results is to use annotate first, and then values, such as:
CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')
The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).
Hope that helps!
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?
Can you tell me what happens when you try this?
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(Count('user')).order_by()
(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)
I'm trying to order a list of items in django by the number of comments they have. However, there seems to be an issue in that the Count function doesn't take into account the fact that django comments also uses a content_type_id to discern between comments for different objects!
This gives me a slight problem in that the comment counts for all objects are wrong using the standard methods; is there a 'nice' fix or do I need to drop back to raw sql?
Code to try and ge the correct ordering:
app_list = App.objects.filter(published=True)
.annotate(num_comments=Count('comments'))
.order_by('-num_comments')
Sample output from the query (note no mention of the content type id):
SELECT "apps_app"."id", "apps_app"."name",
"apps_app"."description","apps_app"."author_name", "apps_app"."site_url",
"apps_app"."source_url", "apps_app"."date_added", "apps_app"."date_modified",
"apps_app"."published", "apps_app"."published_email_sent", "apps_app"."created_by_id",
"apps_app"."rating_votes", "apps_app"."rating_score", COUNT("django_comments"."id") AS
"num_comments" FROM "apps_app" LEFT OUTER JOIN "django_comments" ON ("apps_app"."id" =
"django_comments"."object_pk") WHERE "apps_app"."published" = 1 GROUP BY
"apps_app"."id", "apps_app"."name", "apps_app"."description", "apps_app"."author_name",
"apps_app"."site_url", "apps_app"."source_url", "apps_app"."date_added",
"apps_app"."date_modified", "apps_app"."published", "apps_app"."published_email_sent",
"apps_app"."created_by_id", "apps_app"."rating_votes", "apps_app"."rating_score" ORDER
BY num_comments DESC LIMIT 4
Think I found the answer: Django Snippet
Is it possible to filter within an annotation?
In my mind something like this (which doesn't actually work)
Student.objects.all().annotate(Count('attendance').filter(type="Excused"))
The resultant table would have every student with the number of excused absences. Looking through documentation filters can only be before or after the annotation which would not yield the desired results.
A workaround is this
for student in Student.objects.all():
student.num_excused_absence = Attendance.objects.filter(student=student, type="Excused").count()
This works but does many queries, in a real application this can get impractically long. I think this type of statement is possible in SQL but would prefer to stay with ORM if possible. I even tried making two separate queries (one for all students, another to get the total) and combined them with |. The combination changed the total :(
Some thoughts after reading answers and comments
I solved the attendance problem using extra sql here.
Timmy's blog post was useful. My answer is based off of it.
hash1baby's answer works but seems equally complex as sql. It also requires executing sql then adding the result in a for loop. This is bad for me because I'm stacking lots of these filtering queries together. My solution builds up a big queryset with lots of filters and extra and executes it all at once.
If performance is no issue - I suggest the for loop work around. It's by far the easiest to understand.
As of Django 1.8 you can do this directly in the ORM:
students = Student.objects.all().annotate(num_excused_absences=models.Sum(
models.Case(
models.When(absence__type='Excused', then=1),
default=0,
output_field=models.IntegerField()
)))
Answer adapted from another SO question on the same topic
I haven't tested the sample above but did accomplish something similar in my own app.
You are correct - django does not allow you to filter the related objects being counted, without also applying the filter to the primary objects, and therefore excluding those primary objects with a no related objects after filtering.
But, in a bit of abstraction leakage, you can count groups by using a values query.
So, I collect the absences in a dictionary, and use that in a loop. Something like this:
# a query for students
students = Students.objects.all()
# a query to count the student attendances, grouped by type.
attendance_counts = Attendence(student__in=students).values('student', 'type').annotate(abs=Count('pk'))
# regroup that into a dictionary {student -> { type -> count }}
from itertools import groupby
attendance_s_t = dict((s, (dict(t, c) for (s, t, c) in g)) for s, g in groupby(attendance_counts, lambda (s, t, c): s))
# then use them efficiently:
for student in students:
student.absences = attendance_s_t.get(student.pk, {}).get('Excused', 0)
Maybe this will work for you:
excused = Student.objects.filter(attendance__type='Excused').annotate(abs=Count('attendance'))
You need to filter the Students you're looking for first to just those with excused absences and then annotate the count of them.
Here's a link to the Django Aggregation Docs where it discusses filtering order.
I want to update a customer table with a spreadsheet from our accounting system. Unfortunately I can't just clear out the data and reload all of it, because there are a few records in the table that are not in the imported data (don't ask).
For 2000 records this is taking about 5 minutes, and I wondered if there was a better way of doing it.
for row in data:
try:
try:
customer = models.Retailer.objects.get(shared_id=row['Customer'])
except models.Retailer.DoesNotExist:
customer = models.Retailer()
customer.shared_id = row['Customer']
customer.name = row['Name 1']
customer.address01 = row['Street']
customer.address02 = row['Street 2']
customer.postcode = row['Postl Code']
customer.city = row['City']
customer.save()
except:
print formatExceptionInfo("Error with Customer ID: " + str(row['Customer']))
Look at my answer here: Django: form that updates X amount of models
The QuerySet has update() method - rest is explained in above link.
I've had some success using this bulk update snippet:
http://djangosnippets.org/snippets/446/
It's a bit outdated, but it worked on django 1.1, so I suppose you can still make it work. If you are looking for a quick way to do a one time bulk insert, this is the quickest (I'm not sure I'd trust it for regular use without seriously testing performance).
I've made a terribly crude attempt on a solution for this problem, but it's not finished yet and it doesn`t support working with django orm objects directly - yet.
http://pypi.python.org/pypi/dse/0.1.0
It`s not been properly testet and let me know if you have any suggestions on how to improve it. Using the django orm to do stuff like this is terrible.
Thomas