Django ORM: dynamically add multiple conditions for ManyToManyField - django

Let's say that we have Django models defined as follows:
class Tag(models.Model):
name=models.CharField(unique=True,max_length=50)
class Article(models.Model):
title=models.CharField(max_length=100)
text=models.TextField()
tag = models.ManyToManyField(Tag)
And we have a list of tags:
tag_list = ['tag1','tag2','tag3']
The goal is to select articles that have all tags from tag_list. This question shows a way to achive this with filter conditions added sequentially:
articles = Articles.objects.filter(Q(tag__name=tag_list[0])).filter(Q(tag__name=tag_list[1])).filter(Q(tag__name=tag_list[2]))
But we need to add conditions dynamically. The query below doesn't work:
qlist=[]
for tag in tag_list:
qlist.append(Q(tag__name=tag))
articles = Articles.objects.filter(reduce(operator.and_, qlist))
I ended up querying articles that have at least one tag from the list and then manually filtering query results:
qlist=[]
qlist.append(Q(tag__name__in=tag_list))
articles = Articles.objects.filter(reduce(operator.and_, qlist)).distinct()
for article in articles:
article_tag_list=[]
for tag in article.tag.all():
article_tag_list.append(tag.name)
if set(tag_list).issubset(set(article_tag_list)):
...
Is there a way to add query conditions for ManyToManyField dynamically?

Try this:
q_objects = Q()
for tag in tag_list:
q_objects &= Q(tag__name=tag)
articles = Articles.objects.filter(q_objects)
I've used the pattern above many times, and it always worked for me.
UPDATE: While generally speaking this method works, it doesn't apply to this problem. The code above requires all condition to be true for a single related object. In this case this means there would have to be a single tag with three different names, which is clearly nonsensical.
This is the code that ended up working for OP:
for tag in tag_list:
articles = articles.filter(tag__name=tag)
Django docs provide a detailed explanation of the difference between putting multiple rules in one filter() and having them in multiple filter()s (when it comes to querying multi-valued relationships).

Related

Exclude fields with the same value using Django querysets

In my project, I have Articles and User Responses that have the same value "title". I only want to find the first Articles, because other object have the same "title", these are the users' answers. How can I exclude objects from queryset have the same "title" parameter.
I try this:
q1 = Article.objects.order_by().values('title').distinct()
*works good but it returns something like a list.
Well, I tried to convert it to query:
q2 = Article.objects.filter(title__in=q1).distinct()
*But it causes it to return all Repeat-Topic Articles again.
How to exclude objects from queryset that have the same title without changing them to a list?
On PostgreSQL only, you can pass positional arguments (*fields) in order to specify the names of fields to which the DISTINCT should apply.
If it is your's case then the following must be work:
Article.objects.filter(title__in=q1).order_by('title').distinct('title')

Django: how to filter on subset of many-to-many field?

Let's say I have the classic Article and Tag models. How can I filter all Article models that have at least both tags A and B. So an Article with tags A, B, and C should be included but not the article with tags A and D. Thanks!
This question might be old but I have not found any built-in solution in django (like __in filter tag)
So to keep your example. We have M2M relationship between Article and Tags and want to get all articles which have the given tags, A,B,C.
There is no straightforward solution in Django and we have to think back to SQL(do not worry, we still can do everything in Django ORM)
In M2M relationship there needs to be a common table which connects the both relations. Let's call it TagArticle. A row in this table is just a tag and an article ids.
What we effectively have to do is this:
1) Filter the common TagsArticle table to get only the rows with the A,B, or C tags.
2) Group the found rows by the Article and count the rows.
3) Filter out all rows where the count is smaller then the number of tags(3 in our example)
4) Now join or filter the Article table with the previous result
Fortunately, we do not have to access the TagArticle table directly in Django. The pseudo code is then:
from django.db.models import Count
...
tags = ['A', 'B', 'C']
articleQS = Tag.objects.filter(name__in=tags).values('article')
.annotate(tagCount=Count('article'))
.filter(catCount=len(tags)).values('article')
articles = Article.objects.filter(id__in=articleQS)
Let's say Tag is:
class Tag(models.model):
article = models.ForeignKey('Article')
name = models.CharField(max_length=2)
Then I think you can do:
a_ids = Tag.objects.filter(name='A').values_list('article_id')
b_ids = Tag.objects.filter(name='B').values_list('article_id')
Article.objects.filter(id__in=a_ids).filter(id__in=b_ids)

Django: ManyToMany filter matching on ALL items in a list

I have such a Book model:
class Book(models.Model):
authors = models.ManyToManyField(Author, ...)
...
In short:
I'd like to retrieve the books whose authors are strictly equal to a given set of authors. I'm not sure if there is a single query that does it, but any suggestions will be helpful.
In long:
Here is what I tried, (that failed to run getting an AttributeError)
# A sample set of authors
target_authors = set((author_1, author_2))
# To reduce the search space,
# first retrieve those books with just 2 authors.
candidate_books = Book.objects.annotate(c=Count('authors')).filter(c=len(target_authors))
final_books = QuerySet()
for author in target_authors:
temp_books = candidate_books.filter(authors__in=[author])
final_books = final_books and temp_books
... and here is what I got:
AttributeError: 'NoneType' object has no attribute '_meta'
In general, how should I query a model with the constraint that its ManyToMany field contains a set of given objects as in my case?
ps: I found some relevant SO questions but couldn't get a clear answer. Any good pointer will be helpful as well. Thanks.
Similar to #goliney's approach, I found a solution. However, I think the efficiency could be improved.
# A sample set of authors
target_authors = set((author_1, author_2))
# To reduce the search space, first retrieve those books with just 2 authors.
candidate_books = Book.objects.annotate(c=Count('authors')).filter(c=len(target_authors))
# In each iteration, we filter out those books which don't contain one of the
# required authors - the instance on the iteration.
for author in target_authors:
candidate_books = candidate_books.filter(authors=author)
final_books = candidate_books
You can use complex lookups with Q objects
from django.db.models import Q
...
target_authors = set((author_1, author_2))
q = Q()
for author in target_authors:
q &= Q(authors=author)
Books.objects.annotate(c=Count('authors')).filter(c=len(target_authors)).filter(q)
Q() & Q() is not equal to .filter().filter(). Their raw SQLs are different where by using Q with &, its SQL just add a condition like WHERE "book"."author" = "author_1" and "book"."author" = "author_2". it should return empty result.
The only solution is just by chaining filter to form a SQL with inner join on same table: ... ON ("author"."id" = "author_book"."author_id") INNER JOIN "author_book" T4 ON ("author"."id" = T4."author_id") WHERE ("author_book"."author_id" = "author_1" AND T4."author_id" = "author_1")
I came across the same problem and came to the same conclusion as iuysal,
untill i had to do a medium sized search (with 1000 records with 150 filters my request would time out).
In my particular case the search would result in no records since the chance that a single record will align with ALL 150 filters is very rare, you can get around the performance issues by verifying that there are records in the QuerySet before applying more filters to save time.
# In each iteration, we filter out those books which don't contain one of the
# required authors - the instance on the iteration.
for author in target_authors:
if candidate_books.count() > 0:
candidate_books = candidate_books.filter(authors=author)
For some reason Django applies filters to empty QuerySets.
But if optimization is to be applied correctly however, using a prepared QuerySet and correctly applied indexes are necessary.

How to search for objects without certain tags?

I have a queryset containing some objects. Depending on some case or the other i now want to exclude all the objects without certain tags (_tags is the name of the TagField on my model):
self.queryset=self.queryset.exclude(_tags__id__in=avoid)
But this just leaves me with an error:
Caught FieldError while rendering:
Join on field '_tags' not permitted.
Did you misspell 'id' for the lookup type?
As i'm pretty sure i did not misspell 'id', i did some searching on how to use tagging for something like this. In the docs there is a lot about custom Managers, but somehow i just can't get it how i can use them to get what i want.
edit:
corrected the code above to
self.queryset=self.queryset.exclude(_tags__in=avoid)
where avoid is a list of integers. And that leaves me with the problem that the TagField of django-tagging is just a special CharField (or TextField?). Which will, of course, not sort out anything if i just query it against a list of integers. I could try to solve this in a way like this:
for tag in avoid:
self.queryset=self.queryset.exclude(_tags__contains=tag.name)
which is not only ugly, but also leaves me with the problem of tags made of multiple words or matching parts of other tags.
I somehow have the suspicion that this could be solved in a much prettier way by someone who has understood how django-tagging works.
How are your models defined? Is _tags a ForeignKey field?
if not remove the __id part
self.queryset=self.queryset.exclude(_tags__in=avoid)
Unfortunately, no, there's no prettier way. In fact, the actual solution is even uglier, but when all the tags are stored in a single text field, there's no other way:
from django.db.models import Q
startswith_tag = Q(_tags__startswith=tag.name+' ')
contains_tag = Q(_tags__contains=' '+tag.name+' ')
endswith_tag = Q(_tags__endswith=' '+tag.name)
self.queryset=self.queryset.exclude(startswith_tag | contains_tag | endswith_tag)
The code above assumes that tags are delimited with spaces. If not, you'll have to modify the code to match how they are delimited. The idea is that you use the delimiter as part of the search to ensure that it's the actual tag and not just part of another tag.
If you don't want to do it this way, I'd suggest switching to another tag system that doesn't dump them all into a single text field, django-taggit for instance.
As described in the comment on Chris' answer, django-tagging does not deliver the tagstring when accessing model._tag. In the end i had no other solution than to do the query and sort out the loops containing a certain tag afterwards:
itemlist = list(queryset)
avoid = some_list_of_tag_ids
# search for loops that have NONE of the avoid tags
for item in itemlist:
# has tags and [ if a tag.id in avoid this list has an element]
if (item.tags) and [tag for tag in item.tags if tag.id in avoid]:
# remove the item from the list
itemlist.remove(item)
To complete that the model for this looks like this:
class Item(models.Model):
_tags = TagField(blank=True,null=True)
def _get_tags(self):
return Tag.objects.get_for_object(self)
def _set_tags(self, tags):
Tag.objects.update_tags(tags)
tags = property(_get_tags, _set_tags)
Allthough i tried for quite a while, i found no way of chaining a query against tagging tags into a query chain for an object. For this project I'm stuck with tagging, but this is a real drawback...

How to filter/exclude inactive comments from my annotated Django query?

I'm using the object_list generic view to quickly list a set of Articles. Each Article has comments attached to it. The query uses an annotation to Count() the number of comments and then order_by() that annotated number.
'queryset': Article.objects.annotate(comment_count=Count('comments')).order_by('-comment_count'),
The comments are part of the django.contrib.comments framework and are attached to the model via a Generic Relationship. I've added an explicit reverse lookup to my Article model:
class Article(models.Models):
...
comments = generic.GenericRelation(Comment, content_type_field='content_type', object_id_field='object_pk')
The problem is, this counts "inactive" comments; ones that have is_public=False or is_removed=True. How can I exclude any inactive comments from being counted?
The documentation for aggregations explains how to do this. You need to use a filter clause, making sure you put it after the annotate clause:
Article.objects.annotate(comment_count=Count('comments')).filter(
comment__is_public=True, comment__is_removed=False
).order_by('-comment_count')