Elasticsearch and auto_query - django

In the database objects are named news and news test
class ItemIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True)
name = indexes.CharField(model_attr='name')
name_alt = indexes.CharField(model_attr='name_alt')
def get_model(self):
return Serial
>>> from haystack.query import SearchQuerySet
>>> sqs = SearchQuerySet().all()
>>> sqs.count()
4
>>> SearchQuerySet().auto_query('new') # not working all query!
[]
If use haystack.backends.simple_backend.SimpleEngine its working.
Django==1.5.1
Elasticsearch==0.90
django-haystack==master (2.0)
Why????

It doesn't look like you're populating the all import document field.
Your SearchIndex class has these fields:
text = indexes.CharField(document=True)
name = indexes.CharField(model_attr='name')
name_alt = indexes.CharField(model_attr='name_alt')
You've defined the data source for name and name_alt but not for text. The output from your command line search shows that that field is empty in the search index. You have several options:
Populate that field from a model attribute
Use a prepare_FOO method to prepare the content for that field
Use a template, using the use_template argument for the text field and include any and all content in that template
Now the follow up question is why did auto_query fail but a basic curl query work? Because auto_query is searching the content - the document - and that's missing.

Related

Highlight search terms on a Django/PostgreSQL search results page

How can I create a search results page in Django 1.11, using PostgreSQL full text search, where the terms searched for are highlighted?
Even though Django doesn't support ts_headline feature from postgresql, You can manually apply it as a Function on a QuerySet to annotate:
We need additional function to operate with django ORM. Here is a sample for ts_headline. [original_source for this sample function is linked here]
Headline function sample:
from django.db import models
from django.contrib.postgres.search import Value, Func
class Headline(Func):
function = 'ts_headline'
def __init__(self, field, query, config=None, options=None, **extra):
expressions = [field, query]
if config:
expressions.insert(0, Value(config))
if options:
expressions.append(Value(options))
extra.setdefault('output_field', models.TextField())
super().__init__(*expressions, **extra)
Using the above function you can use it on a QuerySet to annotate
Example Model Definition
class Video(Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
title = models.CharField(max_length=128, verbose_name="Title")
Steps for getting highlighted search results on model title
Filter and get the QuerySet needed to annotated
Annotate using Headline function
Get Values of your document
Filtering Objects
Video.objects.filter(filter_query)
filter_query is a Q() over title
filter_query = Q(title__contains=term)
Annotation with Headline data
Video.objects.filter(filter_query).annotate(title_highlight=Headline(F('title'), text_search_query))
ts_headline directly take the input from the document rather than from ts_vector, So we have to pass the information about which field it should access and what SearchQuery it should perform on it.
text_Search_query is SearchQuery Object with same input as the filter_query
text_search_query = SearchQuery(term)
Now after annotation, this queryset with include a extra field in all objects called title_highlight which would contain the result you wanted like:
these <b>loans</b> not being repaired
Get the values from the annotation field
using values_list over the QuerySet you can get the values from these annotated fields.
final code:
Video.objects.filter(filter_query).annotate(title_highlight=Headline(F('title'), text_search_query)).values_from('title','title_highlight')
In Django 3.1, there is now a SearchHeadline class which makes this task much simpler.
The question asks about Django 1.11. Things have changed, as there is a SearchHeadline class in Django 3.1.
I've not noticed much code on this in Stack Overflow, so consider the following:
Assume that models.py contains an Article model. It has two TextFields ('headline'/'content') and a SearchVectorField for the content:
from django.contrib.postgres.search import SearchVector, SearchVectorField, SearchHeadline
from django.db.models import F, Q
class Article(models.Model):
headline = models.TextField()
content = models.TextField()
content_vector = SearchVectorField(null=True)
In your console/terminal, the following code will work:
query = "book"
Article.objects
.annotate(v_head=SearchHeadline(F("content"), query))
.filter(content_vector=query)
There are two parts to the above - the annotation using SearchHeadline to annotate a v_head 'column', then the filter itself against the query for "book".
Assuming that the text was "Lorem ipsum book lorem ipsum", the output will be:
Lorem ipsum <b>book</b> lorem ipsum.
You can see other similar code on Github.

How do I query for empty MultiValueField results in Django Haystack

Using Django 1.4.2, Haystack 2.0beta, and ElasticSearch 0.19, how do I query for results which have an empty set [] for a MultiValueField?
I'd create an integer field named num_<field> and query against it.
In this example 'emails' is the MultiValueField, so we'll create 'num_emails':
class PersonIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(model_attr='name')
emails = indexes.MultiValueField(null=True)
num_emails = indexes.IntegerField()
def prepare_num_emails(self, object):
return len(object.emails)
Now, in your searches you can use
SearchQuerySet().filter(num_emails=0)
You can also change prepare_ method of your MultiValueField:
def prepare_emails(self, object):
emails = [e for e in object.emails]
return emails if emails else ['None']
Then you can filter:
SearchQuerySet().filter(emails=None)

how to show tags related to a particular tag in django taggit ?

I want to show a list of tags that is related to a particular tag(in an optimised way).
I wonder why django-taggit does not provide an inbuilt functionality for this common task.
The solution I have to offer does a bit more than what you're asking for because it allows to find related tags for a set of given tags, not for only a single given tag. In reality, this is probably what you want to do though. I'm not sure if it's really optimal in term of performance since it uses a subquery, but it works and I find it easy to understand.
First, here is the test case:
from django.test import TestCase
from .models import Item, get_related_tags
class RelatedTagsTest(TestCase):
def setUp(self):
article1 = Item.objects.create(title='Python vs. COBOL')
article1.tags.add('programming', 'python', 'cobol')
article2 = Item.objects.create(title='Python vs. Boa Constrictor')
article2.tags.add('zoology', 'python', 'boa')
article3 = Item.objects.create(title='COBOL vs. FORTRAN')
article3.tags.add('cobol', 'fortran', 'programming')
def test_unique_tag(self):
self.assertEquals(get_related_tags('programming'),
['cobol', 'fortran', 'python'])
self.assertEquals(get_related_tags('python'),
['boa', 'cobol', 'programming', 'zoology'])
def test_multiple_tags(self):
self.assertEquals(get_related_tags('boa', 'fortran'),
['cobol', 'programming', 'python', 'zoology'])
As you can see, by "related tags" we mean the set of tags which are associated with items which are tagged with a set of given tags.
And here is our model with a function to get related tags:
from django.db import models
from taggit.managers import TaggableManager
from taggit.models import Tag
class Item(models.Model):
title = models.CharField(max_length=100)
tags = TaggableManager()
def get_related_tags(*tags):
# Get a QuerySet of related items
related_items = Item.objects.filter(tags__name__in=tags)
# Get tags for those related items (I found the name of the lookup field by
# reading taggit's source code)
qs = Tag.objects.filter(taggit_taggeditem_items__item__in=related_items)
# Exclude the tags we already have
qs = qs.exclude(name__in=tags)
# Order by name and remove duplicates
qs = qs.order_by('name').distinct()
# Return tag names to simplify test code, real code would probably return
# Tag objects
return [t.name for t in qs]
Note that you can easily add the number of items per tag using qs.annotate(count=Count('name')). It will be available as a count attribute on each Tag object.

Q objects and the '&' operator in django

I have a curious problem.
I have 3 objects. All the same
class Articles(models.Model):
owner = models.ForeignKey(Author)
tags = models.ManyToManyField('Tag')
class Tag(models.Model):
name = models.CharField(max_length=255)
and so I have 3 Articles. With all the same tags: 'tag1' and 'tag2'
And I have queries
actionsAll = Articles.objects.filter((Q(tags__name__exact="tag1") | Q(tags__name__exact="tag2"))).distinct()
This gives me all my articles. It will return 6 articles w/o distinct() since it will collect each article 2x since they have both tags.
However with this query:
actionsAll = Articles.objects.filter((Q(tags__name__exact="tag1") & Q(tags__name__exact="tag2"))).distinct()
This gives me no articles.
Since the articles contain both the tags, it should return them all shouldnt it?
If you look at the SQL it generates, you'll see that it checks to see if the same tag has both names. What you need is an IN query or an EXISTS query that traverses the relation.
** import Q from django
from *models import SuperUser, NameUser
import operator
# we do not know the name in the superhero
super_users = SuperUser.objects.all()
q_expressions = [Q(username=user.username) for user in super_users]
# we have bind super_hero with user
name_superheroes_qs = models.NameUser.objects.filter(reduce(operator.or_, q_expressions))

How to obtain and/or save the queryset criteria to the DB?

I would like to save a queryset criteria to the DB for reuse.
So, if I have a queryset like:
Client.objects.filter(state='AL')
# I'm simplifying the problem for readability. In reality I could have
# a very complex queryset, with multiple filters, excludes and even Q() objects.
I would like to save to the DB not the results of the queryset (i.e. the individual client records that have a state field matching 'AL'); but the queryset itself (i.e. the criteria used in filtering the Client model).
The ultimate goal is to have a "saved filter" that can be read from the DB and used by multiple django applications.
At first I thought I could serialize the queryset and save that. But serializing a queryset actually executes the query - and then I end up with a static list of clients in Alabama at the time of serialization. I want the list to be dynamic (i.e. each time I read the queryset from the DB it should execute and retrieve the most current list of clients in Alabama).
Edit: Alternatively, is it possible to obtain a list of filters applied to a queryset?
Something like:
qs = Client.objects.filter(state='AL')
filters = qs.getFilters()
print filters
{ 'state': 'AL' }
You can do as jcd says, storing the sql.
You can also store the conditions.
In [44]: q=Q( Q(content_type__model="User") | Q(content_type__model="Group"),content_type__app_label="auth")
In [45]: c={'name__startswith':'Can add'}
In [46]: Permission.objects.filter(q).filter(**c)
Out[46]: [<Permission: auth | group | Can add group>, <Permission: auth | user | Can add user>]
In [48]: q2=Q( Q(content_type__model="User") | Q(content_type__model="Group"),content_type__app_label="auth", name__startswith='Can add')
In [49]: Permission.objects.filter(q2)
Out[49]: [<Permission: auth | group | Can add group>, <Permission: auth | user | Can add user>]
In that example you see that the conditions are the objects c and q (although they can be joined in one object, q2). You can then serialize these objects and store them on the database as strings.
--edit--
If you need to have all the conditions on a single database record, you can store them in a dictionary
{'filter_conditions': (cond_1, cond_2, cond_3), 'exclude_conditions': (cond_4, cond_5)}
and then serialize the dictionary.
You can store the sql generated by the query using the queryset's _as_sql() method. The method takes a database connection as an argument, so you'd do:
from app.models import MyModel
from django.db import connection
qs = MyModel.filter(pk__gt=56, published_date__lt=datetime.now())
store_query(qs._as_sql(connection))
You can use http://github.com/denz/django-stored-queryset for that
You can pickle the Query object (not the QuerySet):
>>> import pickle
>>> query = pickle.loads(s) # Assuming 's' is the pickled string.
>>> qs = MyModel.objects.all()
>>> qs.query = query # Restore the original 'query'.
Docs: https://docs.djangoproject.com/en/dev/ref/models/querysets/#pickling-querysets
But: You can’t share pickles between versions
you can create your own model to store your queries.
First field can contains fk to ContentTypes
Second field can be just text field with your query etc.
And after that you can use Q object to set queryset for your model.
The current answer was unclear to me as I don't have much experience with pickle. In 2022, I've found that turning a dict into JSON worked well. I'll show you what I did below. I believe pickling still works, so at the end I will show some more thoughts there.
models.py - example database structure
class Transaction(models.Model):
id = models.CharField(max_length=24, primary_key=True)
date = models.DateField(null=False)
amount = models.IntegerField(null=False)
info = models.CharField()
account = models.ForiegnKey(Account, on_delete=models.SET_NULL, null=True)
category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True, blank=False, default=None)
class Account(models.Model):
name = models.CharField()
email = models.EmailField()
class Category(models.Model):
name = models.CharField(unique=True)
class Rule(models.Model):
category = models.ForeignKey(Category, on_delete=models.SET_NULL, blank=False, null=True, default=None)
criteria = models.JSONField(default=dict) # this will hold our query
My models store financial transactions, the category the transaction fits into (e.g., salaried income, 1099 income, office expenses, labor expenses, etc...), and a rule to save a query to automatically categorize future transactions without having to remember the query every year when doing taxes.
I know, for example, that all my transactions with my consulting clients should be marked as 1099 income. So I want to create a rule for clients that will grab each monthly transaction and mark it as 1099 income.
Making the query the old-fashioned way
>>> from transactions.models import Category, Rule, Transaction
>>>
>>> client1_transactions = Transaction.objects.filter(account__name="Client One")
<QuerySet [<Transaction: Transaction object (1111111)>, <Transaction: Transaction object (1111112)>, <Transaction: Transaction object (1111113)...]>
>>> client1_transactions.count()
12
Twelve transactions, one for each month. Beautiful.
But how do we save this to the database?
Save query to database in JSONField
We now have Django 4.0 and a bunch of support for JSONField.
I've been able to grab the filtering values out of a form POST request, then add them in view logic.
urls.py
from transactions import views
app_name = "transactions"
urlpatterns = [
path("categorize", views.categorize, name="categorize"),
path("", views.list, name="list"),
]
transactions/list.html
<form action="{% url 'transactions:categorize' %}" method="POST">
{% csrf_token %}
<label for="info">Info field contains...</label>
<input id="info" type="text" name="info">
<label for="account">Account name contains...</label>
<input id="account" type="text" name="account">
<label for="category">New category should be...</label>
<input id="category" type="text" name="category">
<button type="submit">Make a Rule</button>
</form>
views.py
def categorize(request):
# get POST data from our form
info = request.POST.get("info", "")
account = request.POST.get("account", "")
category = request.POST.get("category", "")
# set up query
query = {}
if info:
query["info__icontains"] = info
if account:
query["account__name__icontains"] = account
# update the database
category_obj, _ = Category.objects.get_or_create(name=category)
transactions = Transaction.objects.filter(**query).order_by("-date")
Rule.objects.get_or_create(category=category_obj, criteria=query)
transactions.update(category=category_obj)
# render the template
return render(
request,
"transactions/list.html",
{
"transactions": transactions.select_related("account"),
},
)
That's pretty much it!
My example here is a little contrived, so please forgive any errors.
How to do it with pickle
I actually lied before. I have a little experience with pickle and I do like it, but I am not sure on how to save it to the database. My guess is that you'd then save the pickled string to a BinaryField.
Perhaps something like this:
>>> # imports
>>> import pickle # standard library
>>> from transactions.models import Category, Rule, Transaction # my own stuff
>>>
>>> # create the query
>>> qs_to_save = Transaction.objects.filter(account__name="Client 1")
>>> qs_to_save.count()
12
>>>
>>> # create the pickle
>>> saved_pickle = pickle.dumps(qs_to_save.query)
>>> type(saved_pickle)
<class 'bytes'>
>>>
>>> # save to database
>>> # make sure `criteria = models.BinaryField()` above in models.py
>>> # I'm unsure about this
>>> test_category, _ = Category.objects.get_or_create(name="Test Category")
>>> test_rule = Rule.objects.create(category=test_category, criteria=saved_pickle)
>>>
>>> # remake queryset at a later date
>>> new_qs = Transaction.objects.all()
>>> new_qs.query = pickle.loads(test_rule.criteria)
>>> new_qs.count()
12
Going even further beyond
I found a way to make this all work with my htmx live search, allowing me to see the results of my query on the front end of my site before saving.
This answer is already too long, so here's a link to a post if you care about that: Saving a Django Query to the Database.