I needed a simple search tool for my django-powered web site, so I went with Haystack and Solr. I have set everything up correctly and can find the correct search results when I type in the exact phrase, but I can't get any results when typing in a partial phrase.
For example: "John" returns "John Doe" but "Joh" doesn't return anything.
Model:
class Person(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
Search Index:
class PersonIndex(SearchIndex):
text = CharField(document=True, use_template=True)
first_name = CharField(model_attr = 'first_name')
last_name = CharField(model_attr = 'last_name')
site.register(Person, PersonIndex)
I'm guessing there's some setting I'm missing that enables partial field matching. I've seen people talking about EdgeNGramFilterFactory() in some forums, and I've Googled it, but I'm not quite sure of its implementation. Plus, I was hoping there was a haystack-specific way of doing it in case I ever switch out the search backend.
You can achieve that behavior by making your index's text field an EdgeNgramField:
class PersonIndex(SearchIndex):
text = EdgeNgramField(document=True, use_template=True)
first_name = CharField(model_attr = 'first_name')
last_name = CharField(model_attr = 'last_name')
In addition to the EdgeNgramField hint that others mentioned in this page (and of course NgramField, if you work with Asian languages), I think it is worth to mention that in Django_haystack you can run raw queries on Solr via following command:
from haystack.query import SearchQuerySet
from haystack.inputs import Raw
SearchQuerySet().filter(text=Raw(query))
where text is the field you want to search, and the query can be anything based on Query Parser Syntax (version 3.6, or 4.6) of Lucene.
In this way you can easily set the query to ABC* or ABC~ or anything else which fits to the syntax.
I had a similar issue while searching for non english words, for instance:
ABC
ABCD
If I want to search for keywords ABC, I will expect the above two results. I was able to achieve the following by converting the keyword to lowercase and using startswith:
keywords = 'ABC'
results.filter(code__startswith=keywords.lower())
I had the same problem and the only way to get the results I wanted was to modify the solr configuration file to include ngram filtering as the default tokenizer is based on white space. So use NGramTokenizer instead. I'd love to know if there was a haystack way of doing the same thing.
I'm not at my machine right now but this should do the trick.
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />
#riz I can't comment yet or I would and I know it's an old comment but in case anyone else runs past this: Make sure to manage.py update_index
Blockquote #Liarez how did you get this to work? I'm using haystack/elastic search and I wasn't able to get it to work.
Related
This is the model that has to be searched:
class BlockQuote(models.Model):
debate = models.ForeignKey(Debate, related_name='quotes')
speaker = models.ForeignKey(Speaker, related_name='quotes')
text = models.TextField()
I have around a thousand instances on the database on my laptop (with around 50000 on the production server)
I am creating a 'manage.py' function that will search through the database and returns all 'BlockQuote' objects whose textfield contains the keyword.
I am doing this with the Django's (1.11) Postgres search options in order to use the 'rank' attribute, which sounds like something that would come in handy. I used the official Django fulltext-search documentation for the code below
Yet when I run this code, it matches with all objects, regardless if BlockQuote.text actually contains the queryfield.
def handle(self, *args, **options):
vector = SearchVector('text')
query = options['query'][0]
Search_Instance = Search_Instance.objects.create(query=query)
set = BlockQuote.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
for result in set:
match = QueryMatch.objects.create(quote=result, query=Search_Instance)
match.save()
Does anyone have an idea of what I am doing wrong?
I don't see you actually filtering ever.
BlockQuote.objects.annotate(...).filter(rank__gte=0.5)
I am trying to add a completion suggester to enable search-as-you-type for a search field in my Django app (using Elastic Search 5.2.x and elasticseach-dsl). After trying to figure this out for a long time, I am not able to figure yet how to bulk index the suggester. Here's my code:
class SchoolIndex(DocType):
name = Text()
school_type = Keyword()
name_suggest = Completion()
Bulk indexing as follows:
def bulk_indexing():
SchoolIndex.init(index="school_index")
es = Elasticsearch()
bulk(client=es, actions=(a.indexing() for a in models.School.objects.all().iterator()))
And have defined an indexing method in models.py:
def indexing(self):
obj = SchoolIndex(
meta = {'id': self.pk},
name = self.name,
school_type = self.school_type,
name_suggest = {'input': self.name } <--- # what goes in here?
)
obj.save(index="school_index")
return obj.to_dict(include_meta=True)
As per the ES docs, suggestions are indexed like any other field. So I could just put a few terms in the name_suggest = statement above in my code which will match the corresponding field, when searched. But my question is how to do that with a ton of records? I was guessing there would be a standard way for ES to automatically come up with a few terms that could be used as suggestions. For example: using each word in the phrase as a term. I could come up something like that on my own (by breaking each phrase into words) but it seems counter-intuitive to do that on my own since I'd guess there would already be a default way that the user could further tweak if needed. But couldn't find anything like that on SO/blogs/ES docs/elasticsearch-dsl docs after searching for quite sometime. (This post by Adam Wattis was very helpful in getting me started though). Will appreciate any pointers.
I think I figured it out (..phew)
In the indexing function, I need to use the following to enable to the prefix completion suggester:
name_suggest = self.name
instead of:
name_suggest = {'input': something.here }
which seems to be used for more custom cases.
Thanks to this video that helped!
I've got a basic django haystack elasticsearch installation running, that seems to be working.. until I hit an autocomplete problem:
It doesn't return autocompletion just the full field. another problem is with data that has CAPS, that isn't normalized (such as usernames..)
MY installation:
django 1.6.4
haystack 2.1.0
elasticsearch 1.3.1
py-elasticsearch 0.6.1
class SocialProfileIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
username = indexes.CharField(model_attr='username')
first_name = indexes.CharField(model_attr='first_name')
last_name = indexes.CharField(model_attr='last_name')
# Auto-complete
username_auto = indexes.EdgeNgramField(model_attr='username')
first_name_auto = indexes.EdgeNgramField(model_attr='first_name')
last_name_auto = indexes.EdgeNgramField(model_attr='last_name')
def get_model(self):
return SocialProfile
def index_queryset(self, using=None):
return self.get_model().objects.all()
Were in the view I return:
results = SearchQuerySet().models(SocialProfile).autocomplete(username_auto=q)
so when indexing a SocialProfile:
username=alonisser
when q (the query) is 'alonisser' I get the correct reply, But when I try 'alon' or similiar I don't get any results.
When I access elasticsearch directly through py-elasticsearch (without haystack):
es = Elasticsearch('http://elasticsearch.url:9200')
es.search('username_auto:alon', index='haystack')
I do get the correct result, so the is stored there and the problem is probably doing something wrong with haystack..
Similiar but different problems is when the searched item has Caps :like 'Alonisser' so searching for 'alonisser' doesn't return any result, but searching for 'Alonisser' does.
What am I doing wrong? Thanks for the help..
I think you already got an answer in the haystack forums but just to bring it out here also.
One way to get rid of the Caps problem is to use a custom prepare method in your index class although my haystack somehow handles it by default :S.
def prepare_username_auto(self, obj):
return obj.username.lower()
This will convert all usernames to lowercase when you run 'update_index'. Then you can also turn you user inserted search term to lower also which should produce correct results.
To search for a part of the word you need to use:
results = SearchQuerySet().models(SocialProfile).autocomplete(username_auto__startswith=q)
I have a django site running django-haystack with xapian as a back end. I got my autocomplete working, but it's giving back weird results. The results coming back from the searchqueryset are incomplete.
For example, I have the following data...
['test', 'test 1', 'test 2']
And if I type in 't', 'te', or 'tes' I get nothing back. However, if I type in 'test' I get back all of the results, as would be expected.
I have something looking like this...
results = SearchQuerySet().autocomplete(auto=q).values('auto')
And my search index looks like this...
class FacilityIndex(SearchIndex):
text = CharField(document=True, use_template=True)
created = DateTimeField(model_attr='created')
auto = EdgeNgramField(model_attr='name')
def get_model(self):
return Facility
def index_queryset(self):
return self.get_model().objects.filter(created__lte=datetime.datetime.now())
Any tips are appreciated. Thanks.
A bit late, but you need to check the min ngram size that is being indexed. It is most likely 4 chars, so it won't match on anything with fewer chars than that. I am not a Xapian user though, so I don't know how to change this configuration option for that backend.
Following is my model:
class Story(models.Model):
author = models.ForeignKey(User, related_name='author_id', default=1)
source = models.ForeignKey(User, related_name='source_id', default=2)
User is the django provided model.
Now I need to find out number of articles authored and sourced by each user.
I thought of using story_set with user object:
res = User.objects.annotate(Count('story_set'))
However, there are two columns in story referencing User. Hence, Django won't know which one to use?
Can anyone help me on this?
story_set doesn't exist one way or another. That would've been the default related_name if you hadn't provided one. Since you did, you have to use those.
res = User.objects.annotate(Count('author_id'))
OR
res = User.objects.annotate(Count('source_id'))
So, that's how Django knows the difference.
FYI: if you had used the default (so you accessed stories via .story_set.all(), you don't use the "_set" part in queries. It would just be Count('story').
the reason django makes you specify a related_name is exactly for this reason. you want Count('author_id') instead of Count('story_set') (and you should probably give these better names, e.g. author_set)
res = User.objects.annotate(num_authored=Count('author_id')).annotate(num_sourced=Count('source_id'))
res[0].num_authored
res[0].num_sourced
If you want both the number of authored and number of sourced articles in one query. You might want more appropriate related names, like "authored" and "sourced".