Django Count() in multiple annotations - django

Say I have a simple forum model:
class User(models.Model):
username = models.CharField(max_length=25)
...
class Topic(models.Model):
user = models.ForeignKey(User)
...
class Post(models.Model):
user = models.ForeignKey(User)
...
Now say I want to see how many topics and posts each users of subset of users has (e.g. their username starts with "ab").
So if I do one query for each post and topic:
User.objects.filter(username_startswith="ab")
.annotate(posts=Count('post'))
.values_list("username","posts")
Yeilds:
[('abe', 5),('abby', 12),...]
and
User.objects.filter(username_startswith="ab")
.annotate(topics=Count('topic'))
.values_list("username","topics")
Yields:
[('abe', 2),('abby', 6),...]
HOWEVER, when I try annotating both to get one list, I get something strange:
User.objects.filter(username_startswith="ab")
.annotate(posts=Count('post'))
.annotate(topics=Count('topic'))
.values_list("username","posts", "topics")
Yields:
[('abe', 10, 10),('abby', 72, 72),...]
Why are the topics and posts multiplied together? I expected this:
[('abe', 5, 2),('abby', 12, 6),...]
What would be the best way of getting the correct list?

I think Count('topics', distinct=True) should do the right thing. That will use COUNT(DISTINCT topic.id) instead of COUNT(topic.id) to avoid duplicates.
User.objects.filter(
username_startswith="ab").annotate(
posts=Count('post', distinct=True)).annotate(
topics=Count('topic', distinct=True)).values_list(
"username","posts", "topics")

Try adding distinct to your last queryset:
User.objects.filter(
username_startswith="ab").annotate(
posts=Count('post')).annotate(
topics=Count('topic')).values_list(
"username","posts", "topics").distinct()
See https://docs.djangoproject.com/en/1.3/ref/models/querysets/#distinct for more details, but basically you're getting duplicate rows because the annotations span multiple tables.

Related

Combining distinct and annotate over a many to many filter with Django

With Django's ORM, I have a simple Question and Topic model like so:
class Topic(models.Model):
name = models.CharField(max_length=200)
class Question(models.Model):
topic_items = models.ManyToManyField(Topic, blank=True)
date_asked = models.DateField()
Suppose I have four Questions asked each on separate dates, with the fourth sharing two topics 'topic1', 'topic2'
If I do the following query with topics_restrict a list of the two topic ids for 'topic1' and 'topic2'...
q_filter = Question.objects.filter(topic_items__in=topics_restrict).distinct()
Then I get four results (instead of five, which would have resulted without the distinct)
Now if I do the following:
return q_filter.annotate(
total=Count('date_asked')
).values_list('total', flat=True)
I get the result [2,1,1,1] instead of [1,1,1,1] - ie, as if the distinct() had never been applied.
The only way to get around it is to do...
q_filter = Question.objects.filter(pk__in=q_filter.values_list('pk', flat=True))
... and then annotate on that q_filter.
But there has to be a better way?

How to filter multiple fields with list of objects

I want to build an webapp like Quora or Medium, where a user can follow users or some topics.
eg: userA is following (userB, userC, tag-Health, tag-Finance).
These are the models:
class Relationship(models.Model):
user = AutoOneToOneField('auth.user')
follows_user = models.ManyToManyField('Relationship', related_name='followed_by')
follows_tag = models.ManyToManyField(Tag)
class Activity(models.Model):
actor_type = models.ForeignKey(ContentType, related_name='actor_type_activities')
actor_id = models.PositiveIntegerField()
actor = GenericForeignKey('actor_type', 'actor_id')
verb = models.CharField(max_length=10)
target_type = models.ForeignKey(ContentType, related_name='target_type_activities')
target_id = models.PositiveIntegerField()
target = GenericForeignKey('target_type', 'target_id')
tags = models.ManyToManyField(Tag)
Now, this would give the following list:
following_user = userA.relationship.follows_user.all()
following_user
[<Relationship: userB>, <Relationship: userC>]
following_tag = userA.relationship.follows_tag.all()
following_tag
[<Tag: tag-job>, <Tag: tag-finance>]
To filter I tried this way:
Activity.objects.filter(Q(actor__in=following_user) | Q(tags__in=following_tag))
But since actor is a GenericForeignKey I am getting an error:
FieldError: Field 'actor' does not generate an automatic reverse relation and therefore cannot be used for reverse querying. If it is a GenericForeignKey, consider adding a GenericRelation.
How can I filter the activities that will be unique, with the list of users and list of tags that the user is following? To be specific, how will I filter GenericForeignKey with the list of the objects to get the activities of the following users.
You should just filter by ids.
First get ids of objects you want to filter on
following_user = userA.relationship.follows_user.all().values_list('id', flat=True)
following_tag = userA.relationship.follows_tag.all()
Also you will need to filter on actor_type. It can be done like this for example.
actor_type = ContentType.objects.get_for_model(userA.__class__)
Or as #Todor suggested in comments. Because get_for_model accepts both model class and model instance
actor_type = ContentType.objects.get_for_model(userA)
And than you can just filter like this.
Activity.objects.filter(Q(actor_id__in=following_user, actor_type=actor_type) | Q(tags__in=following_tag))
What the docs are suggesting is not a bad thing.
The problem is that when you are creating Activities you are using auth.User as an actor, therefore you can't add GenericRelation to auth.User (well maybe you can by monkey-patching it, but that's not a good idea).
So what you can do?
#Sardorbek Imomaliev solution is very good, and you can make it even better if you put all this logic into a custom QuerySet class. (the idea is to achieve DRY-ness and reausability)
class ActivityQuerySet(models.QuerySet):
def for_user(self, user):
return self.filter(
models.Q(
actor_type=ContentType.objects.get_for_model(user),
actor_id__in=user.relationship.follows_user.values_list('id', flat=True)
)|models.Q(
tags__in=user.relationship.follows_tag.all()
)
)
class Activity(models.Model):
#..
objects = ActivityQuerySet.as_manager()
#usage
user_feed = Activity.objects.for_user(request.user)
but is there anything else?
1. Do you really need GenericForeignKey for actor? I don't know your business logic, so probably you do, but using just a regular FK for actor (just like for the tags) will make it possible to do staff like actor__in=users_following.
2. Did you check if there isn't an app for that? One example for a package already solving your problem is django-activity-steam check on it.
3. IF you don't use auth.User as an actor you can do exactly what the docs suggest -> adding a GenericRelation field. In fact, your Relationship class is suitable for this purpose, but I would really rename it to something like UserProfile or at least UserRelation. Consider we have renamed Relation to UserProfile and we create new Activities using userprofile instead. The idea is:
class UserProfile(models.Model):
user = AutoOneToOneField('auth.user')
follows_user = models.ManyToManyField('UserProfile', related_name='followed_by')
follows_tag = models.ManyToManyField(Tag)
activies_as_actor = GenericRelation('Activity',
content_type_field='actor_type',
object_id_field='actor_id',
related_query_name='userprofile'
)
class ActivityQuerySet(models.QuerySet):
def for_userprofile(self, userprofile):
return self.filter(
models.Q(
userprofile__in=userprofile.follows_user.all()
)|models.Q(
tags__in=userprofile.relationship.follows_tag.all()
)
)
class Activity(models.Model):
#..
objects = ActivityQuerySet.as_manager()
#usage
#1st when you create activity use UserProfile
Activity.objects.create(actor=request.user.userprofile, ...)
#2nd when you fetch.
#Check how `for_userprofile` is implemented this time
Activity.objects.for_userprofile(request.user.userprofile)
As stated in the documentation:
Due to the way GenericForeignKey is implemented, you cannot use such fields directly with filters (filter() and exclude(), for example) via the database API. Because a GenericForeignKey isn’t a normal field object, these examples will not work:
You could follow what the error message is telling you, I think you'll have to add a GenericRelation relation to do that. I do not have experience doing that, and I'd have to study it but...
Personally I think this solution is too complex to what you're trying to achieve. If only the user model can follow a tag or authors, why not include a ManyToManyField on it. It would be something like this:
class Person(models.Model):
user = models.ForeignKey(User)
follow_tag = models.ManyToManyField('Tag')
follow_author = models.ManyToManyField('Author')
You could query all followed tag activities per Person like this:
Activity.objects.filter(tags__in=person.follow_tag.all())
And you could search 'persons' following a tag like this:
Person.objects.filter(follow_tag__in=[<tag_ids>])
The same would apply to authors and you could use querysets to do OR, AND, etc.. on your queries.
If you want more models to be able to follow a tag or author, say a System, maybe you could create a Following model that does the same thing Person is doing and then you could add a ForeignKey to Follow both in Person and System
Note that I'm using this Person to meet this recomendation.
You can query seperately for both usrs and tags and then combine them both to get what you are looking for. Please do something like below and let me know if this works..
usrs = Activity.objects.filter(actor__in=following_user)
tags = Activity.objects.filter(tags__in=following_tag)
result = usrs | tags
You can use annotate to join the two primary keys as a single string then use that to filter your queryset.
from django.db.models import Value, TextField
from django.db.models.functions import Concat
following_actor = [
# actor_type, actor
(1, 100),
(2, 102),
]
searchable_keys = [str(at) + "__" + str(actor) for at, actor in following_actor]
result = MultiKey.objects.annotate(key=Concat('actor_type', Value('__'), 'actor_id',
output_field=TextField()))\
.filter(Q(key__in=searchable_keys) | Q(tags__in=following_tag))

Django using the values method with m2m relationships / filtering m2m tables using django

class Book(models.Model):
name = models.CharField(max_length=127, blank=False)
class Author(models.Model):
name = models.CharField(max_length=127, blank=False)
books = models.ManyToMany(Books)
I am trying to filter the authors so I can return a result set of authors like:
[{id: 1, name: 'Grisham', books : [{name: 'The Client'},{name: 'The Street Lawyer}], ..]
Before I had the m2m relationship on author I was able to query for any number of author records and get all of the values I needed using the values method with only one db query.
But it looks like
Author.objects.all().values('name', 'books')
would return something like:
[{id: 1, name: 'Grisham', books :{name: 'The Client'}},{id: 1, name: 'Grisham', books :{name: 'The Street Lawyer'}}]
Looking at the docs it doesn't look like that is possible with the values method.
https://docs.djangoproject.com/en/dev/ref/models/querysets/
Warning Because ManyToManyField attributes and reverse relations can
have multiple related rows, including these can have a multiplier
effect on the size of your result set. This will be especially
pronounced if you include multiple such fields in your values() query,
in which case all possible combinations will be returned.
I want to try to get a result set of n size with with the least amount of database hits authorObject.books.all() would result in at least n db hits.
Is there a way to do this in django?
I think one way of doing this with the least amount of database hits would be to :
authors = Authors.objects.all().values('id')
q = Q()
for id in authors:
q = q | Q(author__id = id)
#m2m author book table.. from my understanding it is
#not accessible in the django QuerySet
author_author_books.filter(q) #grab all of the book ids and author ids with one query
Is there a built in way to query the m2m author_author_books table or am I going to have the write the sql? Is there a way to take advantage of the Q() for doing OR logic in raw sql?
Thanks in advance.
I think you want prefetch_related. Something like this:
authors = Author.objects.prefetch_related('books').all()
More on this here.
If you want to query your author_author_books table, I think you need to specify a "through" table:
class BookAuthor(models.Model):
book = models.ForeignKey(Book)
author = models.ForeignKey(Author)
class Author(models.Model):
name = models.CharField(max_length=127, blank=False)
books = models.ManyToMany(Books, through=BookAuthor)
and then you can query BookAuthor like any other model.

Django Filter Return Many Values

I'm new to django and I think this is a simple question -
I have an intermediate class which is coded as follows -
class Link_Book_Course(models.Model):
book = models.ForeignKey(Book)
course = models.ForeignKey(Course)
image = models.CharField(max_length = 200, null=True)
rating = models.CharField(max_length = 200,null=True)
def __unicode__(self):
return self.title
def save(self):
self.date_created = datetime.now()
super(Link_Book_Course,self).save()
I'm making this call as I'd like to have to have all of the authors of the books (Book is another model with author as a CharField)
storeOfAuthorNames = Link_Book_Course.objects.filter(book__author)
However, it doesn't return a querySet of all of the authors, in fact, it throws an error.
I think it's because book__author has multiple values- how can I get all of them?
Thanks!
I don't think you're using the right queryset method. filter() filters by its arguments - so the expected usage is:
poe = Author.objects.get(name='Edgar Allen Poe')
course_books_by_poe = Link_Book_Course.objects.filter(book__author=poe)
It looks like you're trying to pull a list of the names all the authors of books used in a particular course (or maybe all courses?). Maybe you're looking for .values() or values_list()?
all_authors_in_courses = Link_Book_Course.objects.values_list(
'book__author', flat=True
).distinct()
(Edit: Updated per #ftartaggia's suggestion)
As others already explained, the use of filter method is to get a subset of the whole set of objects and does not return instances of other models (no matter if related objects or so)
If you want to have Author models instances back from django ORM and you can use aggregation APIs then you might want to do something like this:
from django.db.models import Count
Author.objects.annotate(num_books=Count('book')).filter(num_books__gt=1)
the filter method you are trying to use translates more or less into SQL like this:
SELECT * FROM Link_Book_Course INNER JOIN Book ON (...) WHERE Book.author = ;
So as you see your query has an incomplete where clause.
Anyway, it's not the query you are looking for.
What about something like (assuming author is a simple text field of Book and you want only authors of books referred from Link_Book_Course instances):
Book.objects.filter(pk__in=Link_Book_Course.objects.all().values_list("book", flat=True)).values_list("author", flat=True)
To start with, a filter statement filters on a field matching some pattern. So if Book has a simple ForeignKey to Author, you could have
storeOfAuthorNames = Link_Book_Course.objects.filter(book__author="Stephen King"), but not just
storeOfAuthorNames = Link_Book_Course.objects.filter(book__author).
Once you get past that, I am guessing Book has Author as a ManyToManyField, not a ForeignKey (because a book can have multiple authors, and an author can publish multiple books?) In that case, just filter(book__author="Stephen King") will still not be enough. Try Link_Book_Course.objects.filter(book_author__in=myBookObject.author.all())

Django model aggregation

I have a simple hierarchic model whit a Person and RunningScore as child.
this model store data about running score of many user, simplified something like:
class Person(models.Model):
firstName = models.CharField(max_length=200)
lastName = models.CharField(max_length=200)
class RunningScore(models.Model):
person = models.ForeignKey('Person', related_name="scores")
time = models.DecimalField(max_digits=6, decimal_places=2)
If I get a single Person it cames with all RunningScores associated to it, and this is standard behavior. My question is really simple: if I'd like to get a Person with only a RunningScore child (suppose the better result, aka min(time) ) how can I do?
I read the official Django documentation but have not found a
solution.
I am not 100% sure if I get what you mean, but maybe this will help:
from django.db.models import Min
Person.objects.annotate(min_running_time=Min('time'))
The queryset will fetch Person objects with min_running_time additional attribute.
You can also add a filter:
Person.objects.annotate(min_running_time=Min('time')).filter(firstName__startswith='foo')
Accessing the first object's min_running_time attribute:
first_person = Person.objects.annotate(min_running_score=Min('time'))[0]
print first_person.min_running_time
EDIT:
You can define a method or a property such as the following one to get the related object:
class Person(models.Model):
...
#property
def best_runner(self):
try:
return self.runningscore_set.order_by('time')[0]
except IndexError:
return None
If you want one RunningScore for only one Person you could use odering and limit your queryset to 1 object.
Something like this:
Person.runningscore_set.order_by('-time')[0]
Here is the doc on limiting querysets:
https://docs.djangoproject.com/en/1.3/topics/db/queries/#limiting-querysets