Unable to search part of a term with Django Haystack and Elasticsearch - django

I have a project based on
Django==1.9.2
django-haystack==2.4.1
elasticsearch==2.2.0
A very simple search view:
def search_view(request):
query = request.GET.get('q', '')
sqs = SearchQuerySet().filter(content=query)
params = {
'results': sqs,
'query': query,
}
return render_to_response('results.html', params,
context_instance=RequestContext(request))
My search index is as simple as:
class CategoryIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(model_attr='name')
def get_model(self):
return Category
def index_queryset(self, using=None):
return self.get_model().objects.filter(published=True)
The category_text.txt file is just:
{{ object.name }}
In my database I have a few items:
Acqua
Acquario
Aceto
Accento
When I search with my view, I have strange behaviours.
Searching with query "ac" I receive no results! I was expecting to have all my items. I have tryed to change the query using .filter(content__contains=query) (I know it is the default!) but nothing changed.
Searching with query "acqua" I receive 1 result (correct) with the result object, but when I try to print it, the result.object field is None (the other fields contain the correct information).
What am I doing wrong?
Thank you.
UPDATE
I have found a solution to my problem number 2. Latest Haystack version from PyPi is not Django 1.9.x compatible.
I have just added -e git+https://github.com/django-haystack/django-haystack.git#egg=django-haystack to my requirements.txt file and the issue is fixed. More info about that on GitHub: https://github.com/django-haystack/django-haystack/issues/1291
The other issues is still opened and I cannot find any solution to it.

It sounds like you may be running into a minimum number of characters issue for #1. Take a look at the Haystack documents for autocomplete which shows an approach using EdgeNgramField instead of the typical CharField.

Related

Partial matching search in Wagtail with Postgres

I've got a wagtail site powered by Postgres and would like to implement a fuzzy search on all documents. However, according to wagtail docs "SearchField(partial_match=True) is not handled." Does anyone know of a way I can implement my own partial matching search?
I'm leaving this question intentionally open-ended because I'm open to pretty much any solution that works well and is fairly scalable.
We’re currently rebuilding the Wagtail search API in order to make autocomplete usable roughly the same way across backends.
For now, you can use directly the IndexEntry model that stores search data. Unfortunately, django.contrib.postgres.search does not contain a way to do an autocomplete query, so we have to do it ourselves for now. Here is how to do that:
from django.contrib.postgres.search import SearchQuery
from wagtail.contrib.postgres_search.models import IndexEntry
class SearchAutocomplete(SearchQuery):
def as_sql(self, compiler, connection):
return "to_tsquery(''%s':*')", [self.value]
query = SearchAutocomplete('postg')
print(IndexEntry.objects.filter(body_search=query).rank(query))
# All results containing words starting with “postg”
# should be displayed, sorted by relevance.
It doesn't seem to be documented yet, but the gist of autocomplete filtering with Postgres, using a request object, is something like
from django.conf import settings
from wagtail.search.backends import get_search_backend
from wagtail.search.backends.base import FilterFieldError, OrderByFieldError
def filter_queryset(queryset, request):
search_query = request.GET.get("search", "").strip()
search_enabled = getattr(settings, 'WAGTAILAPI_SEARCH_ENABLED', True)
if 'search' in request.GET and search_query:
if not search_enabled:
raise BadRequestError("search is disabled")
search_operator = request.GET.get('search_operator', None)
order_by_relevance = 'order' not in request.GET
sb = get_search_backend()
try:
queryset = sb.autocomplete(search_query, queryset, operator=search_operator, order_by_relevance=order_by_relevance)
except FilterFieldError as e:
raise BadRequestError("cannot filter by '{}' while searching (field is not indexed)".format(e.field_name))
except OrderByFieldError as e:
raise BadRequestError("cannot order by '{}' while searching (field is not indexed)".format(e.field_name))
The line to note is the call to sb.autocomplete.
If you want to use custom fields with autocomplete, you'll also need to add them into search_fields as an AutocompleteField in addition to a SearchField -- for example
search_fields = Page.search_fields + [
index.SearchField("field_to_index", partial_match=True)
index.AutocompleteField("field_to_index", partial_match=True),
...
This solution is working for Wagtail 2.3. If you using an older version, it is unlikely to work, and if you are using a future version, hopefully the details will be incorporated into the official documents, which currently state that autocomplete with Postgres is NOT possible. Thankfully, that has turned out to not be true, due to the work of Bertrand Bordage in the time since he wrote the other answer.

I don't understand the results that's returning from elasticsearch/haystack

The results that are being returned from haystack, using an elasticsearch backend seem erroneous to me. My search index is as follows:
from haystack import indexes
from .models import IosVideo
class VideoIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
title = indexes.CharField(model_attr='title')
absolute_url = indexes.CharField(model_attr='get_absolute_url')
# content_auto = indexes.EdgeNgramField(model_attr='title')
description = indexes.CharField(model_attr='description')
# thumbnail = indexes.CharField(model_attr='thumbnail_url', null=True)
def get_model(self):
return IosVideo
def index_queryset(self, using=None):
return self.get_model().objects.filter(private=False)
My text document looks like:
{{ object.title }}
{{ object.text }}
{{ object.description }}
My query is
SearchQuerySet().models(IosVideo).filter(content="darby")[0]
The result that's returning that makes me think this is not working is a video object with the following characteristics
title: u'Cindy Daniels'
description: u'',
text: u'Cindy Daniels\n\n\n',
absolute_url: u'/videos/testimonial/cindy-daniels/'
Why in the world would the query return such a result? I'm very confused.
My current theory is that it's tokenizing every subset of the char in the query and using that as partial match. Is there a way to decrease this tolerance to be a closer match.
My pip info is
elasticsearch==1.2.0
django-haystack==2.3.1
And the elasticsearch version number is 1.3.1
Additionally when I hit the local server with
http://localhost:9200/haystack/_search/?q=darby&pretty
It returns 10 results.
SearchQuerySet().filter(content="darby")
Returns 4k results.
Does any one know what would cause this type of behavior?
There is a problem with the filter() method on Charfield indexes for django-haystack 2.1.0. You can change them to NgramField instead, for example text = indexes.NgramField(document=True, template_name=True).
The problem is that when you use this combination you get just the first character. So it returns you all the matches that has a 'd' in their text index field.

Django Haystack custom SearchView for pretty urls

I'm trying to setup Django Haystack to search based on some pretty urls. Here is my urlpatterns.
urlpatterns += patterns('',
url(r'^search/$', SearchView(),
name='search_all',
),
url(r'^search/(?P<category>\w+)/$', CategorySearchView(
form_class=SearchForm,
),
name='search_category',
),
)
My custom SearchView class looks like this:
class CategorySearchView(SearchView):
def __name__(self):
return "CategorySearchView"
def __call__(self, request, category):
self.category = category
return super(CategorySearchView, self).__call__(request)
def build_form(self, form_kwargs=None):
data = None
kwargs = {
'load_all': self.load_all,
}
if form_kwargs:
kwargs.update(form_kwargs)
if len(self.request.GET):
data = self.request.GET
kwargs['searchqueryset'] = SearchQuerySet().models(self.category)
return self.form_class(data, **kwargs)
I keep getting this error running the Django dev web server if I try and visit /search/Vendor/q=Microsoft
UserWarning: The model u'Vendor' is not registered for search.
warnings.warn('The model %r is not registered for search.' % model)
And this on my page
The model being added to the query must derive from Model.
If I visit /search/q=Microsoft, it works fine. Is there another way to accomplish this?
Thanks for any pointers
-Jay
There are a couple of things going on here. In your __call__ method you're assigning a category based on a string in the URL. In this error:
UserWarning: The model u'Vendor' is not registered for search
Note the unicode string. If you got an error like The model <class 'mymodel.Model'> is not registered for search then you'd know that you haven't properly created an index for that model. However this is a string, not a model! The models method on the SearchQuerySet class requires a class instance, not a string.
The first thing you could do is use that string to look up a model by content type. This is probably not a good idea! Even if you don't have models indexed which you'd like to keep away from prying eyes, you could at least generate some unnecessary errors.
Better to use a lookup in your view to route the query to the correct model index, using conditionals or perhaps a dictionary. In your __call__ method:
self.category = category.lower()
And if you have several models:
my_querysets = {
'model1': SearchQuerySet().models(Model1),
'model2': SearchQuerySet().models(Model2),
'model3': SearchQuerySet().models(Model3),
}
# Default queryset then searches everything
kwargs['searchqueryset'] = my_querysets.get(self.category, SearchQuerySet())

Autocomplete with Django Haystack

I am having a difficult time getting autocomplete to work with haystack and Solr in a search form. Following the instructions here Auto-complete i was able to create my index in the following way.
class PersonIndex(indexes.RealTimeSearchIndex, indexes.Indexable):
text = CharField(document=True, use_template=True)
first_name = CharField(model_attr='first_name')
last_name = CharField(model_attr='last_name')
first_name_auto = indexes.EdgeNgramField(model_attr='first_name')
def index_queryset(self):
"""Used when the entire index for model is updated."""
return self.get_model().objects.all().order_by('first_name')
def get_model(self):
return Person
And with the way my URL route is set up, i dont have a view request that get directed to, the search method works.
url(r'^search/person/', search_view_factory(
view_class=SearchView,
template='index.html',
form_class=ModelSearchForm
), name='haystack_search'),
The instructions say that we can perform the query in this fashion
from haystack.query import SearchQuerySet
sqs = SearchQuerySet().filter(content_auto=request.GET.get('q', ''))
but where do we put this SearchQuerySet, i am not sure what to override, how to modify my url to route correctly. My search currently works out of the box this way but i want to try auto complete with EdgeNgramField ?
You'll need to define your own custom search form and tell it how to generate the SearchQuerySet it returns to the view, and then tell your search_view_factory to use that form instead of the ModelSearchForm.
Specify the way you want to generate the SearchQuerySet used by your view by overriding the ModelSearchForm search method:
from haystack.forms import ModelSearchForm
class AutocompleteModelSearchForm(ModelSearchForm):
def search(self):
if not self.is_valid():
return self.no_query_found()
if not self.cleaned_data.get('q')
return self.no_query_found()
sqs = self.searchqueryset.filter(first_name_auto=self.cleaned_data['q'])
if self.load_all
sqs = sqs.load_all()
return sqs
This will now perform a filter on the form's SearchQuerySet on the first_name_auto field rather than the auto_query that it would usually do on the text field (see haystack/forms.py to see what the original search function looks like).
You specify that you want to use this form in the argument list to your search_view_factory
from path.to.your.forms import AutocompleteModelSearchForm
url(r'^search/person/', search_view_factory(
view_class=SearchView,
template='index.html',
form_class=AutocompleteModelSearchForm
), name='haystack_search'),

Django search functionality - bug with search query of length 2

As I am an impressed reader of Stack Overflow I want to ask my first question here. Since I encountered a problem with a snippet and I do not know whether I made a mistake or it's a bug in the code I'm using.
I adapted this code for my own site:
http://blog.tkbe.org/archive/django-admin-search-functionality/
It works fine and it's really a great snippet.
But if my search query has length 2, I think that the results are not correct.
So for example if I search for "re" in first name and last name, I get the following results:
Mr. Tom Krem
Ms. Su Ker
Which is pretty strange. For queries with length > 2 I do not encounter this problem.
So maybe this post read somebody who is using the snippet above and can tell me whether he/she encounters the same problem.
If nobody else encounters the problem I know at least that I have a bug somewhere in my code. Maybe in the form I'm using, or something is messed up in the request context.
How can I solve this problem?
Edit 1:
The inclusion tag:
from django import template
from crm.views import SEARCH_VAR
def my_search_form(context):
return {
'context': context,
'search_var': SEARCH_VAR
}
register = template.Library()
register.inclusion_tag('custom_utilities/my_search_form.html')(my_search_form)
The my_search_form.html:
<div id="toolbar"><form
id="changelist-search"
action=""
method="get">
<div><!-- DIV needed for valid HTML -->
<label
for="searchbar"><img src="{{ context.media_url }}/crm/img/search.png"
class="icon"
alt="Search" /></label>
<input
type="text"
size="40"
name="{{ search_var }}"
value="{{ context.query }}"
id="searchbar" />
<input type="submit" value="Search" />
</div>
</form>
</div>
<script
type="text/javascript">document.getElementById("searchbar").focus();
</script>
The view:
#login_required
def crm_contacts(request):
query = request.GET.get('q', '')
#pass additional params to the SortHeaders function
#the additional params will be part of the header <a href...>
#e.g. use it for pagination / use it to provide the query string
additional_params_dict = {'q': query}
foundContacts = search_contact(request,query)
sort_headers = SortHeaders(request, LIST_HEADERS, default_order_field=1, additional_params=additional_params_dict)
if foundContacts is not None:
contact_list = foundContacts.order_by(sort_headers.get_order_by())
else:
contact_list = Contact.objects.order_by(sort_headers.get_order_by())
context = {
'contact_list' : contact_list,
'headers': list(sort_headers.headers()),
'query' : query,
}
return render_to_response("crm/contact_list.html", context,
context_instance=RequestContext(request))
The contact search form:
#models
from crm.models import Contact
from django.db.models import Q
'''
A search form from
http://blog.tkbe.org/archive/django-admin-search-functionality/
adapted to search for contacts.
'''
def search_contact(request,terms=None):
if terms is None:
return Contact.objects.all()
query = Contact.objects
for term in terms:
query = query.filter(
Q(first_name__icontains=term)
| Q(last_name__icontains=term))
return query
Another edit:
I'm using this snippet to sort the table. Probably one should know this in order to understand the code posted above.
Since I can not post links (spam protection) I will try to explain where to find it. Go to Google. Type in: django snippet table sort
Then it should be the second hit. Sort table headers. snippet nr. 308.
Edit: Add the SortHeaders() function
ORDER_VAR = 'o'
ORDER_TYPE_VAR = 'ot'
class SortHeaders:
"""
Handles generation of an argument for the Django ORM's
``order_by`` method and generation of table headers which reflect
the currently selected sort, based on defined table headers with
matching sort criteria.
Based in part on the Django Admin application's ``ChangeList``
functionality.
"""
def __init__(self, request, headers, default_order_field=None,
default_order_type='asc', additional_params=None):
"""
request
The request currently being processed - the current sort
order field and type are determined based on GET
parameters.
headers
A list of two-tuples of header text and matching ordering
criteria for use with the Django ORM's ``order_by``
method. A criterion of ``None`` indicates that a header
is not sortable.
default_order_field
The index of the header definition to be used for default
ordering and when an invalid or non-sortable header is
specified in GET parameters. If not specified, the index
of the first sortable header will be used.
default_order_type
The default type of ordering used - must be one of
``'asc`` or ``'desc'``.
additional_params:
Query parameters which should always appear in sort links,
specified as a dictionary mapping parameter names to
values. For example, this might contain the current page
number if you're sorting a paginated list of items.
"""
if default_order_field is None:
for i, (header, query_lookup) in enumerate(headers):
if query_lookup is not None:
default_order_field = i
break
if default_order_field is None:
raise AttributeError('No default_order_field was specified and none of the header definitions given were sortable.')
if default_order_type not in ('asc', 'desc'):
raise AttributeError('If given, default_order_type must be one of \'asc\' or \'desc\'.')
if additional_params is None: additional_params = {}
self.header_defs = headers
self.additional_params = additional_params
self.order_field, self.order_type = default_order_field, default_order_type
# Determine order field and order type for the current request
params = dict(request.GET.items())
if ORDER_VAR in params:
try:
new_order_field = int(params[ORDER_VAR])
if headers[new_order_field][1] is not None:
self.order_field = new_order_field
except (IndexError, ValueError):
pass # Use the default
if ORDER_TYPE_VAR in params and params[ORDER_TYPE_VAR] in ('asc', 'desc'):
self.order_type = params[ORDER_TYPE_VAR]
def headers(self):
"""
Generates dicts containing header and sort link details for
all defined headers.
"""
for i, (header, order_criterion) in enumerate(self.header_defs):
th_classes = []
new_order_type = 'asc'
if i == self.order_field:
th_classes.append('sorted %sending' % self.order_type)
new_order_type = {'asc': 'desc', 'desc': 'asc'}[self.order_type]
yield {
'text': header,
'sortable': order_criterion is not None,
'url': self.get_query_string({ORDER_VAR: i, ORDER_TYPE_VAR: new_order_type}),
'class_attr': (th_classes and ' class="%s"' % ' '.join(th_classes) or ''),
}
def get_query_string(self, params):
"""
Creates a query string from the given dictionary of
parameters, including any additonal parameters which should
always be present.
"""
params.update(self.additional_params)
return '?%s' % '&'.join(['%s=%s' % (param, value) \
for param, value in params.items()])
def get_order_by(self):
"""
Creates an ordering criterion based on the current order
field and order type, for use with the Django ORM's
``order_by`` method.
"""
return '%s%s' % (
self.order_type == 'desc' and '-' or '',
self.header_defs[self.order_field][1],
)
If you run manage.py shell and then:
>>> from crm.models import Contact
>>> from django.db.models import Q
>>> list=Contact.objects.filter(Q(first_name__icontains='re')|Q(last_name__icontains='re'))
>>> print list
What is the output?
Edit: Right, so if you try:
>>> list=Contact.objects.filter(Q(first_name__icontains='mot')|Q(last_name__icontains='mot'))
>>> print list
(I'm trying to narrow down on the terms that are giving you problem and I saw your last comment)
What is the output?
Edit: If both of the above queries work in the shell, something else is modifying your queryset somewhere and adding some additional criteria...
Are you sure sort_headers() is not modifying the queryset with more than just an order by clause? Could you post sort_headers() to your question?