Highlight search terms on a Django/PostgreSQL search results page - django

How can I create a search results page in Django 1.11, using PostgreSQL full text search, where the terms searched for are highlighted?

Even though Django doesn't support ts_headline feature from postgresql, You can manually apply it as a Function on a QuerySet to annotate:
We need additional function to operate with django ORM. Here is a sample for ts_headline. [original_source for this sample function is linked here]
Headline function sample:
from django.db import models
from django.contrib.postgres.search import Value, Func
class Headline(Func):
function = 'ts_headline'
def __init__(self, field, query, config=None, options=None, **extra):
expressions = [field, query]
if config:
expressions.insert(0, Value(config))
if options:
expressions.append(Value(options))
extra.setdefault('output_field', models.TextField())
super().__init__(*expressions, **extra)
Using the above function you can use it on a QuerySet to annotate
Example Model Definition
class Video(Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
title = models.CharField(max_length=128, verbose_name="Title")
Steps for getting highlighted search results on model title
Filter and get the QuerySet needed to annotated
Annotate using Headline function
Get Values of your document
Filtering Objects
Video.objects.filter(filter_query)
filter_query is a Q() over title
filter_query = Q(title__contains=term)
Annotation with Headline data
Video.objects.filter(filter_query).annotate(title_highlight=Headline(F('title'), text_search_query))
ts_headline directly take the input from the document rather than from ts_vector, So we have to pass the information about which field it should access and what SearchQuery it should perform on it.
text_Search_query is SearchQuery Object with same input as the filter_query
text_search_query = SearchQuery(term)
Now after annotation, this queryset with include a extra field in all objects called title_highlight which would contain the result you wanted like:
these <b>loans</b> not being repaired
Get the values from the annotation field
using values_list over the QuerySet you can get the values from these annotated fields.
final code:
Video.objects.filter(filter_query).annotate(title_highlight=Headline(F('title'), text_search_query)).values_from('title','title_highlight')

In Django 3.1, there is now a SearchHeadline class which makes this task much simpler.

The question asks about Django 1.11. Things have changed, as there is a SearchHeadline class in Django 3.1.
I've not noticed much code on this in Stack Overflow, so consider the following:
Assume that models.py contains an Article model. It has two TextFields ('headline'/'content') and a SearchVectorField for the content:
from django.contrib.postgres.search import SearchVector, SearchVectorField, SearchHeadline
from django.db.models import F, Q
class Article(models.Model):
headline = models.TextField()
content = models.TextField()
content_vector = SearchVectorField(null=True)
In your console/terminal, the following code will work:
query = "book"
Article.objects
.annotate(v_head=SearchHeadline(F("content"), query))
.filter(content_vector=query)
There are two parts to the above - the annotation using SearchHeadline to annotate a v_head 'column', then the filter itself against the query for "book".
Assuming that the text was "Lorem ipsum book lorem ipsum", the output will be:
Lorem ipsum <b>book</b> lorem ipsum.
You can see other similar code on Github.

Related

Filtering models based off of the concatenation of a field on a one to many relationship in django

Say I had some models defined as such:
Class Document(models.Model):
pass
Class Paragraph(models.Model):
text = models.CharField(max_length=2000)
document = models.ForeignKey(Document, related_name="paragraphs")
And I wanted to find all of the Documents that have the word "foo" contained in any of their paragraphs text fields.
Something like:
Document.objects.annotate(text=[Concatenation of all paragraphs text]).filter(text__icontains='foo')
How would I go about this in a Django way, not writing direct SQL queries.
How about this:
queryset = Paragraph.objects.filter(text__icontains='foo')
text = ''.join(obj.text for obj in queryset)
As #ReinstateMonica posted in her answer starting with filtering the Paragraph objects seems like the easiest approach. The result of this could be used to filter the documents. If you only have only one relevant text field the approach could look something like:
results = Document.objects.filter(
pk__in=Paragraph.objects.filter(
text__icontains='foo').values_list('document', flat=True)
)
If your Paragraph model contains multiple text fields you can first concat the fields and then use the same approach, so:
from django.db.models import Concat
qs = Document.objects.filter(
pk__in=Paragraph.objects.annotate(
conc_text=Concat('text', 'text2') # all relevant text fields
).filter(
conc_text__icontains='foo'
).values_list(
'document', flat=True
)
)
The Django ORM can perform queries that span relationships:
Document.objects.filter(paragraphs__text__icontains='foo')

Prefetch or annotate Django model with the foreign key of a related object

Let's say we have the following models:
class Author(Model):
...
class Serie(Model):
...
class Book(Model):
authors = ManyToManyField(Author, related_name="books")
serie = ForeignKey(Serie)
...
How can I get the list of authors, with their series ?
I tried different combinations of annotate and prefetch:
list_authors = Author.objects.prefetch(Prefetch("books__series", queryset=Serie.objects.all(), to_attr="series"))
Trying to use list_authors[0].series throws an exception because Author has no series field
list_authors = Author.objects.annotate(series=FilteredExpression("books__series", condition=Q(...))
Trying to use list_authors[0].series throws an exception because Author has no series field
list_authors = Author.objects.annotate(series=F('books__series'))
returns all possible combinations of (author, serie) that have a book in common
As I'm using PostgreSQL for my database, I tried:
from django.contrib.postgres.aggregates import ArrayAgg
...
list_authors = Author.objects.annotate(series=ArrayAgg('books__serie', distinct=True, filter=Q(...)))
It works fine, but returns only the id of the related objects.
list_authors = Author.objects.annotate(series=ArrayAgg(
Subquery(
Serie.objects.filter(
livres__auteurs=OuterRef('pk'),
...
).prefetch_related(...)
)
))
fails because it needs an output_field, and a Model is not a valid value for output_field
BUT
I can get the number of series for an author, so why not the actual list of them:
list_authors = Author.objects.annotate(nb_series=Count("books__series", filter=Q(...), distinct=True)
list_authors[0].nb_series
>>> 2
Thus I assume that what I try to do is possible, but I am at a loss regarding the "How"...
I don't think you can do this with an annotation on the Author queryset - as you've already found you can do F('books__series') but that will not return distinct results. Annotations generally only make sense if the result is a single value per row.
What you could do instead is have a method on the Author model that fetches all the series for that author with a relatively simple query. This will mean one additional query per author, but I can't see any alternative. Something like this:
class Author:
def get_series(self):
return Serie.objects.filter(book__authors=self).distinct()
Then you just do:
list_authors = Author.objects.all()
list_authors[0].get_series()

django queryset F get last

django 2.0.2 python 3.4
models.py
Post(models.Model):
Id = pk
content = text
Reply(models.Model):
Id = pk
PostId = Fk(Post)
content = text
view.py
Post.objects.all().annotate(lastreply=F("Reply__content__last"))
can use last query in F() ?
As far as I know, latest cannot be used with F().
One possible solution is including a timestamp in the reply class
Post(models.Model):
Id = pk
content = text
Reply(models.Model):
Id = pk
PostId = Fk(Post)
content = text
timestamp = DateTime(auto)
Then you can use a query of this format to get the latest reply for each post.
Reply.objects.annotate(max_time=Max('Post__Reply__timestamp')).filter(timestamp=F('max_time'))
Please note that this is really time consuming for large number of records.
If you are using a Postgres DB you can use distinct()
Reply.objects.order_by('Post__Id','-timestamp').distinct('Post__Id')
F expression has no way to do that.
but Django has another way to handle it.
https://docs.djangoproject.com/en/2.0/ref/models/expressions/#subquery-expressions
for this problem, the code below can solve this:
from django.db.models import OuterRef, Subquery
sub_qs = Reply.objects.filter(
PostId=OuterRef('pk')
).order_by('timestamp')
qs = Post.objects.annotate(
last_reply_content=Subquery(
sub_qs.values('content')[:1]))
how does it work?
sub_qs is the related model queryset, where you want to take only the last reply for each post, to do that, we use the OuterRef, it will take care to get replies related to this post, and finally the order_by that will order by the timestamp, the first is the most recent, and the last is the eldest.
sub_qs = Reply.objects.filter(
PostId=OuterRef('pk')
).order_by('timestamp')
the second part is the Post queryset with a annotate, we wanna apply the sub_qs in an extra field, and using subquery will allow us to insert another queryset inside of annotate
we use .values('content') to get only the content field, and slice the sub_qs with [:1] to get only the first occurrence.
qs = Post.objects.annotate(
last_reply_content=Subquery(
sub_qs.values('content')[:1]))

How to filter multiple fields with list of objects

I want to build an webapp like Quora or Medium, where a user can follow users or some topics.
eg: userA is following (userB, userC, tag-Health, tag-Finance).
These are the models:
class Relationship(models.Model):
user = AutoOneToOneField('auth.user')
follows_user = models.ManyToManyField('Relationship', related_name='followed_by')
follows_tag = models.ManyToManyField(Tag)
class Activity(models.Model):
actor_type = models.ForeignKey(ContentType, related_name='actor_type_activities')
actor_id = models.PositiveIntegerField()
actor = GenericForeignKey('actor_type', 'actor_id')
verb = models.CharField(max_length=10)
target_type = models.ForeignKey(ContentType, related_name='target_type_activities')
target_id = models.PositiveIntegerField()
target = GenericForeignKey('target_type', 'target_id')
tags = models.ManyToManyField(Tag)
Now, this would give the following list:
following_user = userA.relationship.follows_user.all()
following_user
[<Relationship: userB>, <Relationship: userC>]
following_tag = userA.relationship.follows_tag.all()
following_tag
[<Tag: tag-job>, <Tag: tag-finance>]
To filter I tried this way:
Activity.objects.filter(Q(actor__in=following_user) | Q(tags__in=following_tag))
But since actor is a GenericForeignKey I am getting an error:
FieldError: Field 'actor' does not generate an automatic reverse relation and therefore cannot be used for reverse querying. If it is a GenericForeignKey, consider adding a GenericRelation.
How can I filter the activities that will be unique, with the list of users and list of tags that the user is following? To be specific, how will I filter GenericForeignKey with the list of the objects to get the activities of the following users.
You should just filter by ids.
First get ids of objects you want to filter on
following_user = userA.relationship.follows_user.all().values_list('id', flat=True)
following_tag = userA.relationship.follows_tag.all()
Also you will need to filter on actor_type. It can be done like this for example.
actor_type = ContentType.objects.get_for_model(userA.__class__)
Or as #Todor suggested in comments. Because get_for_model accepts both model class and model instance
actor_type = ContentType.objects.get_for_model(userA)
And than you can just filter like this.
Activity.objects.filter(Q(actor_id__in=following_user, actor_type=actor_type) | Q(tags__in=following_tag))
What the docs are suggesting is not a bad thing.
The problem is that when you are creating Activities you are using auth.User as an actor, therefore you can't add GenericRelation to auth.User (well maybe you can by monkey-patching it, but that's not a good idea).
So what you can do?
#Sardorbek Imomaliev solution is very good, and you can make it even better if you put all this logic into a custom QuerySet class. (the idea is to achieve DRY-ness and reausability)
class ActivityQuerySet(models.QuerySet):
def for_user(self, user):
return self.filter(
models.Q(
actor_type=ContentType.objects.get_for_model(user),
actor_id__in=user.relationship.follows_user.values_list('id', flat=True)
)|models.Q(
tags__in=user.relationship.follows_tag.all()
)
)
class Activity(models.Model):
#..
objects = ActivityQuerySet.as_manager()
#usage
user_feed = Activity.objects.for_user(request.user)
but is there anything else?
1. Do you really need GenericForeignKey for actor? I don't know your business logic, so probably you do, but using just a regular FK for actor (just like for the tags) will make it possible to do staff like actor__in=users_following.
2. Did you check if there isn't an app for that? One example for a package already solving your problem is django-activity-steam check on it.
3. IF you don't use auth.User as an actor you can do exactly what the docs suggest -> adding a GenericRelation field. In fact, your Relationship class is suitable for this purpose, but I would really rename it to something like UserProfile or at least UserRelation. Consider we have renamed Relation to UserProfile and we create new Activities using userprofile instead. The idea is:
class UserProfile(models.Model):
user = AutoOneToOneField('auth.user')
follows_user = models.ManyToManyField('UserProfile', related_name='followed_by')
follows_tag = models.ManyToManyField(Tag)
activies_as_actor = GenericRelation('Activity',
content_type_field='actor_type',
object_id_field='actor_id',
related_query_name='userprofile'
)
class ActivityQuerySet(models.QuerySet):
def for_userprofile(self, userprofile):
return self.filter(
models.Q(
userprofile__in=userprofile.follows_user.all()
)|models.Q(
tags__in=userprofile.relationship.follows_tag.all()
)
)
class Activity(models.Model):
#..
objects = ActivityQuerySet.as_manager()
#usage
#1st when you create activity use UserProfile
Activity.objects.create(actor=request.user.userprofile, ...)
#2nd when you fetch.
#Check how `for_userprofile` is implemented this time
Activity.objects.for_userprofile(request.user.userprofile)
As stated in the documentation:
Due to the way GenericForeignKey is implemented, you cannot use such fields directly with filters (filter() and exclude(), for example) via the database API. Because a GenericForeignKey isn’t a normal field object, these examples will not work:
You could follow what the error message is telling you, I think you'll have to add a GenericRelation relation to do that. I do not have experience doing that, and I'd have to study it but...
Personally I think this solution is too complex to what you're trying to achieve. If only the user model can follow a tag or authors, why not include a ManyToManyField on it. It would be something like this:
class Person(models.Model):
user = models.ForeignKey(User)
follow_tag = models.ManyToManyField('Tag')
follow_author = models.ManyToManyField('Author')
You could query all followed tag activities per Person like this:
Activity.objects.filter(tags__in=person.follow_tag.all())
And you could search 'persons' following a tag like this:
Person.objects.filter(follow_tag__in=[<tag_ids>])
The same would apply to authors and you could use querysets to do OR, AND, etc.. on your queries.
If you want more models to be able to follow a tag or author, say a System, maybe you could create a Following model that does the same thing Person is doing and then you could add a ForeignKey to Follow both in Person and System
Note that I'm using this Person to meet this recomendation.
You can query seperately for both usrs and tags and then combine them both to get what you are looking for. Please do something like below and let me know if this works..
usrs = Activity.objects.filter(actor__in=following_user)
tags = Activity.objects.filter(tags__in=following_tag)
result = usrs | tags
You can use annotate to join the two primary keys as a single string then use that to filter your queryset.
from django.db.models import Value, TextField
from django.db.models.functions import Concat
following_actor = [
# actor_type, actor
(1, 100),
(2, 102),
]
searchable_keys = [str(at) + "__" + str(actor) for at, actor in following_actor]
result = MultiKey.objects.annotate(key=Concat('actor_type', Value('__'), 'actor_id',
output_field=TextField()))\
.filter(Q(key__in=searchable_keys) | Q(tags__in=following_tag))

Elasticsearch and auto_query

In the database objects are named news and news test
class ItemIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True)
name = indexes.CharField(model_attr='name')
name_alt = indexes.CharField(model_attr='name_alt')
def get_model(self):
return Serial
>>> from haystack.query import SearchQuerySet
>>> sqs = SearchQuerySet().all()
>>> sqs.count()
4
>>> SearchQuerySet().auto_query('new') # not working all query!
[]
If use haystack.backends.simple_backend.SimpleEngine its working.
Django==1.5.1
Elasticsearch==0.90
django-haystack==master (2.0)
Why????
It doesn't look like you're populating the all import document field.
Your SearchIndex class has these fields:
text = indexes.CharField(document=True)
name = indexes.CharField(model_attr='name')
name_alt = indexes.CharField(model_attr='name_alt')
You've defined the data source for name and name_alt but not for text. The output from your command line search shows that that field is empty in the search index. You have several options:
Populate that field from a model attribute
Use a prepare_FOO method to prepare the content for that field
Use a template, using the use_template argument for the text field and include any and all content in that template
Now the follow up question is why did auto_query fail but a basic curl query work? Because auto_query is searching the content - the document - and that's missing.