Django haystack elasticsearch problems with autocomplete (and queries with Capital letters) - django

I've got a basic django haystack elasticsearch installation running, that seems to be working.. until I hit an autocomplete problem:
It doesn't return autocompletion just the full field. another problem is with data that has CAPS, that isn't normalized (such as usernames..)
MY installation:
django 1.6.4
haystack 2.1.0
elasticsearch 1.3.1
py-elasticsearch 0.6.1
class SocialProfileIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
username = indexes.CharField(model_attr='username')
first_name = indexes.CharField(model_attr='first_name')
last_name = indexes.CharField(model_attr='last_name')
# Auto-complete
username_auto = indexes.EdgeNgramField(model_attr='username')
first_name_auto = indexes.EdgeNgramField(model_attr='first_name')
last_name_auto = indexes.EdgeNgramField(model_attr='last_name')
def get_model(self):
return SocialProfile
def index_queryset(self, using=None):
return self.get_model().objects.all()
Were in the view I return:
results = SearchQuerySet().models(SocialProfile).autocomplete(username_auto=q)
so when indexing a SocialProfile:
username=alonisser
when q (the query) is 'alonisser' I get the correct reply, But when I try 'alon' or similiar I don't get any results.
When I access elasticsearch directly through py-elasticsearch (without haystack):
es = Elasticsearch('http://elasticsearch.url:9200')
es.search('username_auto:alon', index='haystack')
I do get the correct result, so the is stored there and the problem is probably doing something wrong with haystack..
Similiar but different problems is when the searched item has Caps :like 'Alonisser' so searching for 'alonisser' doesn't return any result, but searching for 'Alonisser' does.
What am I doing wrong? Thanks for the help..

I think you already got an answer in the haystack forums but just to bring it out here also.
One way to get rid of the Caps problem is to use a custom prepare method in your index class although my haystack somehow handles it by default :S.
def prepare_username_auto(self, obj):
return obj.username.lower()
This will convert all usernames to lowercase when you run 'update_index'. Then you can also turn you user inserted search term to lower also which should produce correct results.
To search for a part of the word you need to use:
results = SearchQuerySet().models(SocialProfile).autocomplete(username_auto__startswith=q)

Related

Django - Search matches with all objects - even if they don't actually match

This is the model that has to be searched:
class BlockQuote(models.Model):
debate = models.ForeignKey(Debate, related_name='quotes')
speaker = models.ForeignKey(Speaker, related_name='quotes')
text = models.TextField()
I have around a thousand instances on the database on my laptop (with around 50000 on the production server)
I am creating a 'manage.py' function that will search through the database and returns all 'BlockQuote' objects whose textfield contains the keyword.
I am doing this with the Django's (1.11) Postgres search options in order to use the 'rank' attribute, which sounds like something that would come in handy. I used the official Django fulltext-search documentation for the code below
Yet when I run this code, it matches with all objects, regardless if BlockQuote.text actually contains the queryfield.
def handle(self, *args, **options):
vector = SearchVector('text')
query = options['query'][0]
Search_Instance = Search_Instance.objects.create(query=query)
set = BlockQuote.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
for result in set:
match = QueryMatch.objects.create(quote=result, query=Search_Instance)
match.save()
Does anyone have an idea of what I am doing wrong?
I don't see you actually filtering ever.
BlockQuote.objects.annotate(...).filter(rank__gte=0.5)

ElasticSearch - bulk indexing for a completion suggester in python

I am trying to add a completion suggester to enable search-as-you-type for a search field in my Django app (using Elastic Search 5.2.x and elasticseach-dsl). After trying to figure this out for a long time, I am not able to figure yet how to bulk index the suggester. Here's my code:
class SchoolIndex(DocType):
name = Text()
school_type = Keyword()
name_suggest = Completion()
Bulk indexing as follows:
def bulk_indexing():
SchoolIndex.init(index="school_index")
es = Elasticsearch()
bulk(client=es, actions=(a.indexing() for a in models.School.objects.all().iterator()))
And have defined an indexing method in models.py:
def indexing(self):
obj = SchoolIndex(
meta = {'id': self.pk},
name = self.name,
school_type = self.school_type,
name_suggest = {'input': self.name } <--- # what goes in here?
)
obj.save(index="school_index")
return obj.to_dict(include_meta=True)
As per the ES docs, suggestions are indexed like any other field. So I could just put a few terms in the name_suggest = statement above in my code which will match the corresponding field, when searched. But my question is how to do that with a ton of records? I was guessing there would be a standard way for ES to automatically come up with a few terms that could be used as suggestions. For example: using each word in the phrase as a term. I could come up something like that on my own (by breaking each phrase into words) but it seems counter-intuitive to do that on my own since I'd guess there would already be a default way that the user could further tweak if needed. But couldn't find anything like that on SO/blogs/ES docs/elasticsearch-dsl docs after searching for quite sometime. (This post by Adam Wattis was very helpful in getting me started though). Will appreciate any pointers.
I think I figured it out (..phew)
In the indexing function, I need to use the following to enable to the prefix completion suggester:
name_suggest = self.name
instead of:
name_suggest = {'input': something.here }
which seems to be used for more custom cases.
Thanks to this video that helped!

Using django haystack search with global search bar in template

I have a django project that needs to search 2 different models and one of the models has 3 types that I need to filter based on. I have haystack installed and working in a basic sense (using the default url conf and SearchView for my model and the template from the getting started documentation is returning results fine).
The problem is that I'm only able to get results by using the search form in the basic search.html template and I'm trying to make a global search bar work with haystack but I can't seem to get it right and I'm not having a lot of luck with the haystack documentation. I found another question on here that led me to the following method in my search app.
my urls.py directs "/search" to this view in my search.views:
def search_posts(request):
post_type = str(request.GET.get('type')).lower()
sqs = SearchQuerySet().all().filter(type=post_type)
view = search_view_factory(
view_class=SearchView,
template='search/search.html',
searchqueryset=sqs,
form_class=HighlightedSearchForm
)
return view(request)
The url string that comes in looks something like:
http://example.com/search/?q=test&type=blog
This will get the query string from my global search bar but returns no results, however if I remove the .filter(type=post_type) part from the sqs line I will get search results again (albeit not filtered by post type). Any ideas? I think I'm missing something fairly obvious but I can't seem to figure this out.
Thanks,
-Sean
EDIT:
It turns out that I am just an idiot. The reason why my filtering on the SQS by type was returning no results was because I didn't have the type field included in my PostIndex class. I changed my PostIndex to:
class PostIndex(indexes.SearchIndex, indexes.Indexable):
...
type = indexes.CharField(model_attr='type')
and rebuilt and it all works now.
Thanks for the response though!
def search_posts(request):
post_type = str(request.GET.get('type')).lower()
sqs = SearchQuerySet().filter(type=post_type)
clean_query = sqs.query.clean(post_type)
result = sqs.filter(content=clean_query)
view = search_view_factory(
view_class=SearchView,
template='search/search.html',
searchqueryset=result,
form_class=HighlightedSearchForm
)
return view(request)

Django-haystack (xapian) autocomplete giving incomplete results

I have a django site running django-haystack with xapian as a back end. I got my autocomplete working, but it's giving back weird results. The results coming back from the searchqueryset are incomplete.
For example, I have the following data...
['test', 'test 1', 'test 2']
And if I type in 't', 'te', or 'tes' I get nothing back. However, if I type in 'test' I get back all of the results, as would be expected.
I have something looking like this...
results = SearchQuerySet().autocomplete(auto=q).values('auto')
And my search index looks like this...
class FacilityIndex(SearchIndex):
text = CharField(document=True, use_template=True)
created = DateTimeField(model_attr='created')
auto = EdgeNgramField(model_attr='name')
def get_model(self):
return Facility
def index_queryset(self):
return self.get_model().objects.filter(created__lte=datetime.datetime.now())
Any tips are appreciated. Thanks.
A bit late, but you need to check the min ngram size that is being indexed. It is most likely 4 chars, so it won't match on anything with fewer chars than that. I am not a Xapian user though, so I don't know how to change this configuration option for that backend.

How do I do a partial field match using Haystack?

I needed a simple search tool for my django-powered web site, so I went with Haystack and Solr. I have set everything up correctly and can find the correct search results when I type in the exact phrase, but I can't get any results when typing in a partial phrase.
For example: "John" returns "John Doe" but "Joh" doesn't return anything.
Model:
class Person(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
Search Index:
class PersonIndex(SearchIndex):
text = CharField(document=True, use_template=True)
first_name = CharField(model_attr = 'first_name')
last_name = CharField(model_attr = 'last_name')
site.register(Person, PersonIndex)
I'm guessing there's some setting I'm missing that enables partial field matching. I've seen people talking about EdgeNGramFilterFactory() in some forums, and I've Googled it, but I'm not quite sure of its implementation. Plus, I was hoping there was a haystack-specific way of doing it in case I ever switch out the search backend.
You can achieve that behavior by making your index's text field an EdgeNgramField:
class PersonIndex(SearchIndex):
text = EdgeNgramField(document=True, use_template=True)
first_name = CharField(model_attr = 'first_name')
last_name = CharField(model_attr = 'last_name')
In addition to the EdgeNgramField hint that others mentioned in this page (and of course NgramField, if you work with Asian languages), I think it is worth to mention that in Django_haystack you can run raw queries on Solr via following command:
from haystack.query import SearchQuerySet
from haystack.inputs import Raw
SearchQuerySet().filter(text=Raw(query))
where text is the field you want to search, and the query can be anything based on Query Parser Syntax (version 3.6, or 4.6) of Lucene.
In this way you can easily set the query to ABC* or ABC~ or anything else which fits to the syntax.
I had a similar issue while searching for non english words, for instance:
ABC
ABCD
If I want to search for keywords ABC, I will expect the above two results. I was able to achieve the following by converting the keyword to lowercase and using startswith:
keywords = 'ABC'
results.filter(code__startswith=keywords.lower())
I had the same problem and the only way to get the results I wanted was to modify the solr configuration file to include ngram filtering as the default tokenizer is based on white space. So use NGramTokenizer instead. I'd love to know if there was a haystack way of doing the same thing.
I'm not at my machine right now but this should do the trick.
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />
#riz I can't comment yet or I would and I know it's an old comment but in case anyone else runs past this: Make sure to manage.py update_index
Blockquote #Liarez how did you get this to work? I'm using haystack/elastic search and I wasn't able to get it to work.