Sorting by many to many relationship - django

In the simplified version of my problem I have a model Document that has a manay to many relationship to Tag. I would like to have a query, that given a list of tags will sort the Documents in the order they match the tags i.e. the documents that match more tags will be displayed first and the documents that match fewer tags be displayed later. I know how to do this with a large plain SQL query but i'm having difficulties getting it to work with querysets. Anyone could help?
class Document(model.Model):
title = CharField(max_length = 20)
content = TextField()
class Tag(model.Model):
display_name = CharField(max_length = 10)
documents = ManyToManyField(Document, related_name = "tags")
I would like to do something like the following:
documents = Documents.objects.all().order_by(count(tags__in = ["java", "python"]))
and get first the documents that match both "java" and "python", then the documents that match only one of them and finally the documents that don't match any.
Thanks in advance for your help.

Have a look at this : How to sort by annotated Count() in a related model in Django
Some doc :https://docs.djangoproject.com/en/1.6/topics/db/aggregation/#order-by

Related

How do I annotate a django queryset with StringAgg or ArrayAgg concatenating one column from mulitple children rows?

Documents is the parent table.
Paragraphs is the child table.
Users filter Documents based on various search criteria.
Then I wish to annotate Documents with certain Paragraphs filtered by a text query.
The same text query is used to filter Documents and rank them (SearchRank). This ranking makes it necessary to start from Documents and annotate them with Paragraphs, instead of starting from Paragraphs and grouping them by Document.
The postgresql way of concatenating one text field from multiple rows in Paragraphs would be the following:
SELECT array_to_string(
ARRAY(
SELECT paragraph.text
FROM paragraph
WHERE document id = '...'
ORDER BY paragraph.number),
', ');
I am trying to translate this into django coding.
I have tried numerous django approaches, to no avail.
I can annotate 1 Paragraph.
Query_sum is a Q() object built from user input.
results = Documents.filter(Query_sum)
sub_paragraphs = Paragraphs.filter(Query_sum).filter(document=OuterRef('id'))
results = results.annotate(paragraphs=Subquery(sub_paragraphs.values('text')[:1], output_field=TextField()))
Problems start when I get rid of slicing [:1].
results = results.annotate(paragraphs=Subquery(sub_paragraphs.values('text'), output_field=TextField()))
I then get the following error:
"more than one row returned by a subquery used as an expression".
To fix that, I tried to use ArrayAgg and StringAgg.
I made quite a mess ;-)
The Documents queryset (result) should be annotated either with a list of relevant Paragraphs (ArrayAgg), or a string of Paragraphs separated by any delimiter (StringAgg).
Any idea of how to proceed? I would be extremely grateful
We can annotate and order the documents with the number of paragraphs it has that match the query by using annotate with Sum, Case and When
documents = Document.objects.annotate(
matches=Sum(Case(
# This could depend on the related name for the paragraph -> document relationship
When(paragraphs__text__icontains=search_string, then=Value(1)),
default=Value(0),
output_field=IntegerField(),
)))
).order_by('-matches')
Then, to get all the paragraphs that match the query for each document we an use prefetch_related. We can use a Prefetch object to filter the prefetch operation
documents = documents.prefetch_related(Prefetch(
'paragraphs',
queryset=Paragraph.objects.filter(text__icontains=search_string),
to_attrs='matching_paragraphs'
))
You can then loop over the documents in ranked order and they will have an attribute "matching_paragraphs" that contains all the matching paragraphs

NDB - querying repeated structured property for attribute

I have two models:
class Author(ndb.Model):
email = ndb.StringProperty(indexed=true)
class Course(ndb.Model):
student = ndb.StructuredProperty(Author, repeated=True)
I am trying to query Course to find where a student's email matches that of user.email_address. Is it possible to structure this as a single query?
You have to query by using Author object as a filter
query = Course.query(Course.student.email == 'my#email.com')
But this query is correct only if you are querying for a single property. Official documentation suggests to use following filter
query = Course.query(Course.student == Student(email='my#email.com'))
See https://cloud.google.com/appengine/docs/standard/python/ndb/queries#filtering_structured_properties for more information

Django ORM: dynamically add multiple conditions for ManyToManyField

Let's say that we have Django models defined as follows:
class Tag(models.Model):
name=models.CharField(unique=True,max_length=50)
class Article(models.Model):
title=models.CharField(max_length=100)
text=models.TextField()
tag = models.ManyToManyField(Tag)
And we have a list of tags:
tag_list = ['tag1','tag2','tag3']
The goal is to select articles that have all tags from tag_list. This question shows a way to achive this with filter conditions added sequentially:
articles = Articles.objects.filter(Q(tag__name=tag_list[0])).filter(Q(tag__name=tag_list[1])).filter(Q(tag__name=tag_list[2]))
But we need to add conditions dynamically. The query below doesn't work:
qlist=[]
for tag in tag_list:
qlist.append(Q(tag__name=tag))
articles = Articles.objects.filter(reduce(operator.and_, qlist))
I ended up querying articles that have at least one tag from the list and then manually filtering query results:
qlist=[]
qlist.append(Q(tag__name__in=tag_list))
articles = Articles.objects.filter(reduce(operator.and_, qlist)).distinct()
for article in articles:
article_tag_list=[]
for tag in article.tag.all():
article_tag_list.append(tag.name)
if set(tag_list).issubset(set(article_tag_list)):
...
Is there a way to add query conditions for ManyToManyField dynamically?
Try this:
q_objects = Q()
for tag in tag_list:
q_objects &= Q(tag__name=tag)
articles = Articles.objects.filter(q_objects)
I've used the pattern above many times, and it always worked for me.
UPDATE: While generally speaking this method works, it doesn't apply to this problem. The code above requires all condition to be true for a single related object. In this case this means there would have to be a single tag with three different names, which is clearly nonsensical.
This is the code that ended up working for OP:
for tag in tag_list:
articles = articles.filter(tag__name=tag)
Django docs provide a detailed explanation of the difference between putting multiple rules in one filter() and having them in multiple filter()s (when it comes to querying multi-valued relationships).

Django: how to filter on subset of many-to-many field?

Let's say I have the classic Article and Tag models. How can I filter all Article models that have at least both tags A and B. So an Article with tags A, B, and C should be included but not the article with tags A and D. Thanks!
This question might be old but I have not found any built-in solution in django (like __in filter tag)
So to keep your example. We have M2M relationship between Article and Tags and want to get all articles which have the given tags, A,B,C.
There is no straightforward solution in Django and we have to think back to SQL(do not worry, we still can do everything in Django ORM)
In M2M relationship there needs to be a common table which connects the both relations. Let's call it TagArticle. A row in this table is just a tag and an article ids.
What we effectively have to do is this:
1) Filter the common TagsArticle table to get only the rows with the A,B, or C tags.
2) Group the found rows by the Article and count the rows.
3) Filter out all rows where the count is smaller then the number of tags(3 in our example)
4) Now join or filter the Article table with the previous result
Fortunately, we do not have to access the TagArticle table directly in Django. The pseudo code is then:
from django.db.models import Count
...
tags = ['A', 'B', 'C']
articleQS = Tag.objects.filter(name__in=tags).values('article')
.annotate(tagCount=Count('article'))
.filter(catCount=len(tags)).values('article')
articles = Article.objects.filter(id__in=articleQS)
Let's say Tag is:
class Tag(models.model):
article = models.ForeignKey('Article')
name = models.CharField(max_length=2)
Then I think you can do:
a_ids = Tag.objects.filter(name='A').values_list('article_id')
b_ids = Tag.objects.filter(name='B').values_list('article_id')
Article.objects.filter(id__in=a_ids).filter(id__in=b_ids)

Django Order_by blank

I have a simple model
title = models.CharField(max_length=250)
url = models.CharField(max_length=250)
title_french = models.CharField(max_length=250)
I want to order it via title_french, however when it orders in A-Z in this way, all the blank values are at the top. In the case ot blank values I display the English title.
So I get A-Z for the French titles, but at the top there is a load of English title unordered.
Any advice?
For your case, I think you should do the sorting in your python code (currently, as it is, the sorting is made in the database). It is not possible, imho, to do what you want in the db, at least without writing some sql by hand.
So the idea would be to do something like this in your view :
your_objects = list(YourObject.objects.filter(....))
your_objects.sort(key=lambda ob: ob.title_french if ob.title_french else ob.title)
As long as you sort small lists, this should not be a too problematic performance issue.
have you tried ordering by multiple fields (doc):
ordering = ('title_french', 'title')
specify both the columns, title_french and title in order_by
queryset.order_by('title_french', 'title')
title_french will be given first preference and if there are two entries with the same title_french then those two entries will be sorted by their title
Here is a way to order blank value last while only using the ORM:
from django.db.models import Case, When, Value
...
title_french_blank_last = Case(
When(title_french="", then=Value(1)),
default=Value(0)
)
...
queryset.order_by(title_french_blank_last, "title_french", "title")
Django has the option to order nulls_first and nulls_last, for details see the docs.
In your case it would be something like this (not tested):
MyModel.objects.order_by(Coalesce('title_french', 'title').asc(nulls_last=True))
You would still have to do some logic in Python to display the title when the french title is None.