ElasticSearch - bulk indexing for a completion suggester in python

ElasticSearch - bulk indexing for a completion suggester in python - django

I am trying to add a completion suggester to enable search-as-you-type for a search field in my Django app (using Elastic Search 5.2.x and elasticseach-dsl). After trying to figure this out for a long time, I am not able to figure yet how to bulk index the suggester. Here's my code:
class SchoolIndex(DocType):
name = Text()
school_type = Keyword()
name_suggest = Completion()
Bulk indexing as follows:
def bulk_indexing():
SchoolIndex.init(index="school_index")
es = Elasticsearch()
bulk(client=es, actions=(a.indexing() for a in models.School.objects.all().iterator()))
And have defined an indexing method in models.py:
def indexing(self):
obj = SchoolIndex(
meta = {'id': self.pk},
name = self.name,
school_type = self.school_type,
name_suggest = {'input': self.name } <--- # what goes in here?
)
obj.save(index="school_index")
return obj.to_dict(include_meta=True)
As per the ES docs, suggestions are indexed like any other field. So I could just put a few terms in the name_suggest = statement above in my code which will match the corresponding field, when searched. But my question is how to do that with a ton of records? I was guessing there would be a standard way for ES to automatically come up with a few terms that could be used as suggestions. For example: using each word in the phrase as a term. I could come up something like that on my own (by breaking each phrase into words) but it seems counter-intuitive to do that on my own since I'd guess there would already be a default way that the user could further tweak if needed. But couldn't find anything like that on SO/blogs/ES docs/elasticsearch-dsl docs after searching for quite sometime. (This post by Adam Wattis was very helpful in getting me started though). Will appreciate any pointers.

I think I figured it out (..phew)
In the indexing function, I need to use the following to enable to the prefix completion suggester:
name_suggest = self.name
instead of:
name_suggest = {'input': something.here }
which seems to be used for more custom cases.
Thanks to this video that helped!

Related

Django - Search matches with all objects - even if they don't actually match

This is the model that has to be searched:
class BlockQuote(models.Model):
debate = models.ForeignKey(Debate, related_name='quotes')
speaker = models.ForeignKey(Speaker, related_name='quotes')
text = models.TextField()
I have around a thousand instances on the database on my laptop (with around 50000 on the production server)
I am creating a 'manage.py' function that will search through the database and returns all 'BlockQuote' objects whose textfield contains the keyword.
I am doing this with the Django's (1.11) Postgres search options in order to use the 'rank' attribute, which sounds like something that would come in handy. I used the official Django fulltext-search documentation for the code below
Yet when I run this code, it matches with all objects, regardless if BlockQuote.text actually contains the queryfield.
def handle(self, *args, **options):
vector = SearchVector('text')
query = options['query'][0]
Search_Instance = Search_Instance.objects.create(query=query)
set = BlockQuote.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
for result in set:
match = QueryMatch.objects.create(quote=result, query=Search_Instance)
match.save()
Does anyone have an idea of what I am doing wrong?

I don't see you actually filtering ever.
BlockQuote.objects.annotate(...).filter(rank__gte=0.5)

Accessing instance attributes that start with a certain string

In my view, I'm trying to blank/delete a number of fields that start with real_.
I can do something like:
plan = get_object_or_404(Plan, pk=self.kwargs['plan_id'])
plan.real_time = None
plan.real_date = None
plan.real_comments = None
plan.real_whatever = None
....
plan.save()
However I guess there must be a way to do this programmatically. All I'd need to do is access the names of the the fields, compare whether it indeed starts with real_ and then update that field.
I'm using get_fields() (as per the documentation). I'm not sure though how to do the last part though.
Following is the code of my view:
plan = get_object_or_404(Plan, pk=self.kwargs['plan_id'])
plan_fields = plan._meta.get_fields()
for field in plan_fields:
if field.name[:5] == "real_":
plan.<not sure what to do here> = None
plan.save()
I guess I must be overlooking something small. Any pointer?
Using Django 1.9.

if field.name[:5] == "real_":
setattr(plan, field.name, None)
Python doc.

I would recommend something nice and neat like this:
plan = get_object_or_404(Plan, pk=self.kwargs['plan_id'])
real_fields = [field for field in plan._meta.get_fields() if field.name.startswith('real_')]
for field in real_fields:
setattr(plan, field, None)
plan.save()
This is partially opinion based, but I feel that the use of the list comprehension and .startswith() are slightly more Pythonic.

Django haystack elasticsearch problems with autocomplete (and queries with Capital letters)

I've got a basic django haystack elasticsearch installation running, that seems to be working.. until I hit an autocomplete problem:
It doesn't return autocompletion just the full field. another problem is with data that has CAPS, that isn't normalized (such as usernames..)
MY installation:
django 1.6.4
haystack 2.1.0
elasticsearch 1.3.1
py-elasticsearch 0.6.1
class SocialProfileIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
username = indexes.CharField(model_attr='username')
first_name = indexes.CharField(model_attr='first_name')
last_name = indexes.CharField(model_attr='last_name')
# Auto-complete
username_auto = indexes.EdgeNgramField(model_attr='username')
first_name_auto = indexes.EdgeNgramField(model_attr='first_name')
last_name_auto = indexes.EdgeNgramField(model_attr='last_name')
def get_model(self):
return SocialProfile
def index_queryset(self, using=None):
return self.get_model().objects.all()
Were in the view I return:
results = SearchQuerySet().models(SocialProfile).autocomplete(username_auto=q)
so when indexing a SocialProfile:
username=alonisser
when q (the query) is 'alonisser' I get the correct reply, But when I try 'alon' or similiar I don't get any results.
When I access elasticsearch directly through py-elasticsearch (without haystack):
es = Elasticsearch('http://elasticsearch.url:9200')
es.search('username_auto:alon', index='haystack')
I do get the correct result, so the is stored there and the problem is probably doing something wrong with haystack..
Similiar but different problems is when the searched item has Caps :like 'Alonisser' so searching for 'alonisser' doesn't return any result, but searching for 'Alonisser' does.
What am I doing wrong? Thanks for the help..

I think you already got an answer in the haystack forums but just to bring it out here also.
One way to get rid of the Caps problem is to use a custom prepare method in your index class although my haystack somehow handles it by default :S.
def prepare_username_auto(self, obj):
return obj.username.lower()
This will convert all usernames to lowercase when you run 'update_index'. Then you can also turn you user inserted search term to lower also which should produce correct results.
To search for a part of the word you need to use:
results = SearchQuerySet().models(SocialProfile).autocomplete(username_auto__startswith=q)

Django: How to use django.forms.ModelChoiceField with a Raw SQL query?

I'm trying to render a form with a combo that shows related entities. Therefore I'm using a ModelChoiceField.
This approach works well, until I needed to limit which entities to show. If I use a simple query expression it also works well, but things break if I use a raw SQL query.
So my code that works, sets the queryset to a filter expression.
class ReservationForm(forms.Form):
location_time_slot = ModelChoiceField(queryset=LocationTimeSlot.objects.all(), empty_label="Select your prefered time")
def __init__(self,*args,**kwargs):
city_id = kwargs.pop("city_id") # client is the parameter passed from views.py
super(ReservationForm, self).__init__(*args,**kwargs)
# TODO: move this to a manager
self.fields['location_time_slot'].queryset = LocationTimeSlot.objects.filter(city__id = city_id )
BUT, if I change that to a raw query I start having problems. Code that does not work:
class ReservationForm(forms.Form):
location_time_slot = ModelChoiceField(queryset=LocationTimeSlot.objects.all(), empty_label="Select your prefered time")
def __init__(self,*args,**kwargs):
city_id = kwargs.pop("city_id") # client is the parameter passed from views.py
super(ReservationForm, self).__init__(*args,**kwargs)
# TODO: move this to a manager
query = """SELECT ts.id, ts.datetime_to, ts.datetime_from, ts.available_reserves, l.name, l.'order'
FROM reservations_locationtimeslot AS ts
INNER JOIN reservations_location AS l ON l.id = ts.location_id
WHERE l.city_id = %s
AND ts.available_reserves > 0
AND ts.datetime_from > datetime() """
time_slots = LocationTimeSlot.objects.raw(query, [city_id])
self.fields['location_time_slot'].queryset = time_slots
The first error I get when trying to render the widget is: 'RawQuerySet' object has no attribute 'all'
I could solve that one thanks to one of the commets in enter link description here, by doing:
time_slots.all = time_slots.__iter__ # Dummy fix to allow default form rendering with raw SQL
But now I'm getting something similar when posting the form:
'RawQuerySet' object has no attribute 'get'
Is there a proper way to prepare a RawQuerySet to be used by ModelChoiceField?
Thanks!

Are you sure you actually need a raw query there? Just looking at that query, I can't see any reason you can't just do it with filter(location__city=city_id, available_reserves__gte=0, datetime_from__gt=datetime.datetime.now()).
Raw query sets are missing a number of methods that are defined on conventional query sets, so just dropping them in place isn't likely to work without writing your own definitions for all those methods.

I temporarily fixed the problem adding the missing methods.
The way I'm currently using the ModelChoiceField I only needed to add the all() and get() methods, but in different scenarios you might need to add some other methods as well. Also this is not a perfect solution because:
1) Defining the get method this way migth produce incorrect results. I think the get() method is used to validate that the selected option is within the options returned by all(). The way I temporarily implemented it only validates that the id exists in the table.
2) I guess the get method is less performant specified this way.
If anyone can think of a better solution, please let me know.
So my temporary solution:
class LocationTimeSlotManager(models.Manager):
def availableSlots(self, city_id):
query = """SELECT ts.id, ts.datetime_to, ts.datetime_from, ts.available_reserves, l.name, l.'order'
FROM reservations_locationtimeslot AS ts
.....
.....
MORE SQL """
time_slots = LocationTimeSlot.objects.raw(query, [city_id])
# Dummy fix to allow default form rendering with raw SQL
time_slots.all = time_slots.__iter__
time_slots.get = LocationTimeSlot.objects.get
return time_slots

How do I do a partial field match using Haystack?

I needed a simple search tool for my django-powered web site, so I went with Haystack and Solr. I have set everything up correctly and can find the correct search results when I type in the exact phrase, but I can't get any results when typing in a partial phrase.
For example: "John" returns "John Doe" but "Joh" doesn't return anything.
Model:
class Person(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
Search Index:
class PersonIndex(SearchIndex):
text = CharField(document=True, use_template=True)
first_name = CharField(model_attr = 'first_name')
last_name = CharField(model_attr = 'last_name')
site.register(Person, PersonIndex)
I'm guessing there's some setting I'm missing that enables partial field matching. I've seen people talking about EdgeNGramFilterFactory() in some forums, and I've Googled it, but I'm not quite sure of its implementation. Plus, I was hoping there was a haystack-specific way of doing it in case I ever switch out the search backend.

You can achieve that behavior by making your index's text field an EdgeNgramField:
class PersonIndex(SearchIndex):
text = EdgeNgramField(document=True, use_template=True)
first_name = CharField(model_attr = 'first_name')
last_name = CharField(model_attr = 'last_name')

In addition to the EdgeNgramField hint that others mentioned in this page (and of course NgramField, if you work with Asian languages), I think it is worth to mention that in Django_haystack you can run raw queries on Solr via following command:
from haystack.query import SearchQuerySet
from haystack.inputs import Raw
SearchQuerySet().filter(text=Raw(query))
where text is the field you want to search, and the query can be anything based on Query Parser Syntax (version 3.6, or 4.6) of Lucene.
In this way you can easily set the query to ABC* or ABC~ or anything else which fits to the syntax.

I had a similar issue while searching for non english words, for instance:
ABC
ABCD
If I want to search for keywords ABC, I will expect the above two results. I was able to achieve the following by converting the keyword to lowercase and using startswith:
keywords = 'ABC'
results.filter(code__startswith=keywords.lower())

I had the same problem and the only way to get the results I wanted was to modify the solr configuration file to include ngram filtering as the default tokenizer is based on white space. So use NGramTokenizer instead. I'd love to know if there was a haystack way of doing the same thing.
I'm not at my machine right now but this should do the trick.
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />

#riz I can't comment yet or I would and I know it's an old comment but in case anyone else runs past this: Make sure to manage.py update_index
Blockquote #Liarez how did you get this to work? I'm using haystack/elastic search and I wasn't able to get it to work.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

ElasticSearch - bulk indexing for a completion suggester in python - django

I think I figured it out (..phew) In the indexing function, I need to use the following to enable to the prefix completion suggester: name_suggest = self.name instead of: name_suggest = {'input': something.here } which seems to be used for more custom cases. Thanks to this video that helped!

Related

Django - Search matches with all objects - even if they don't actually match

Accessing instance attributes that start with a certain string

Django haystack elasticsearch problems with autocomplete (and queries with Capital letters)

Django: How to use django.forms.ModelChoiceField with a Raw SQL query?

How do I do a partial field match using Haystack?

Categories

Resources