Django + Haystack + Elasticsearch search a word with a single quote - django

I have a setup of Django+Haystack+Elasticsearch. I was making a search with q="Çakıcı'nın İlk Kurşunu". And no result was appearing. As I look into the text field for this object in Elasticsearch index I see the text is stored as "Çakıcı'nın İlk Kurşunu". Single quote was converted to html safe notation.
So I assume when I create index my db objects are converted by Elasticsearch.
In order to overcome this situation I have changed the q field by overwriting the clean_q method of Form class. I used the django.utils.html.escape method to escape single quote as elasticsearch stored in the index.
Here is my form code.
class BstBookSearchForm(SearchForm):
def no_query_found(self):
return self.searchqueryset.all()
def search(self):
#print self.cleaned_data.get('q')
# First, store the SearchQuerySet received from other processing. (the main work is run internally by Haystack here).
sqs = super(BstBookSearchForm, self).search()
# if something goes wrong
if not self.is_valid():
return self.no_query_found()
# you can then adjust the search results and ask for instance to order the results by title
#sqs = sqs.order_by(book_name)
return sqs
def clean_q(self):
return escape(self.cleaned_data['q'])
Is there a built in way to do this conversion, by Django, by Haystack or by Elasticsearch.
Would you consider my solution a bad practice?

Related

end point to accept multiple values, including query strings

I have a view which should accept an end point with a query parameter, as well as without a parameter.
http://localhost:8001/v1/subjects?owner_ids=62,144
and
http://localhost:8001/v1/subjects
Here's my view file...
class SubjectPagination(JsonApiPageNumberPagination):
"""
Required for frontend to retrieve full list.
"""
max_page_size = 5000
class SubjectViewSet(Subject.get_viewset()):
pagination_class = SubjectPagination
def get_queryset(self):
import pdb; pdb.set_trace()
queryset = Subject.objects.all()
if self.request.GET['owner_ids']:
owner_id_list = self.request.GET['owner_ids'].split(',')
owner_id_list_integer = []
for i in owner_id_list:
owner_id_list_integer.append(int(i))
return queryset.filter(organization__in=owner_id_list_integer)
else:
return queryset
SubjectUserRoleViewSet = Subject.get_by_user_role_viewset(
SubjectViewSet, GroupRoleMap, Role)
I am trying to figure out how to handle both the end points? Please advice what needs to be done at the view to handle a URI with or without query strings?
Here's the urls.py
router.register(r'subjects', views.SubjectViewSet)
First of all, is a good practice to send the parameters in url-form-encode, avoiding things like that, in this case for send a list you could send id as:
?owner_ids[]=62&owner_ids[]=144
the querydict its going to be like this :
<QueryDict: {'owner_ids[]': ['62', '144']}>
and you could process it easily, like this
self.request.GET.getlist('owner_ids[]', [])
remember to use the get and get list functions of the request method GET and POST, to avoid dict errors.
Second, split returns a list the for statement in owner list id is totally unnecessary, and the queryset statement __in accept array of strings, if you actually want to convert all the items to integers use list comprehensions. For example, to convert all the items in a list to integer, just have to use:
owner_ids = [int(i) for i in owner_ids ]
this is way more fast in python and way more pythonic, and also cool too see.
and last, all urls should finish in /, even django has a settings for that called append_slash
this is what i can tell about the ambiguous question you are asking, in the next times please write questions more precisely that help people help you.

Django - Search matches with all objects - even if they don't actually match

This is the model that has to be searched:
class BlockQuote(models.Model):
debate = models.ForeignKey(Debate, related_name='quotes')
speaker = models.ForeignKey(Speaker, related_name='quotes')
text = models.TextField()
I have around a thousand instances on the database on my laptop (with around 50000 on the production server)
I am creating a 'manage.py' function that will search through the database and returns all 'BlockQuote' objects whose textfield contains the keyword.
I am doing this with the Django's (1.11) Postgres search options in order to use the 'rank' attribute, which sounds like something that would come in handy. I used the official Django fulltext-search documentation for the code below
Yet when I run this code, it matches with all objects, regardless if BlockQuote.text actually contains the queryfield.
def handle(self, *args, **options):
vector = SearchVector('text')
query = options['query'][0]
Search_Instance = Search_Instance.objects.create(query=query)
set = BlockQuote.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
for result in set:
match = QueryMatch.objects.create(quote=result, query=Search_Instance)
match.save()
Does anyone have an idea of what I am doing wrong?
I don't see you actually filtering ever.
BlockQuote.objects.annotate(...).filter(rank__gte=0.5)

ElasticSearch - bulk indexing for a completion suggester in python

I am trying to add a completion suggester to enable search-as-you-type for a search field in my Django app (using Elastic Search 5.2.x and elasticseach-dsl). After trying to figure this out for a long time, I am not able to figure yet how to bulk index the suggester. Here's my code:
class SchoolIndex(DocType):
name = Text()
school_type = Keyword()
name_suggest = Completion()
Bulk indexing as follows:
def bulk_indexing():
SchoolIndex.init(index="school_index")
es = Elasticsearch()
bulk(client=es, actions=(a.indexing() for a in models.School.objects.all().iterator()))
And have defined an indexing method in models.py:
def indexing(self):
obj = SchoolIndex(
meta = {'id': self.pk},
name = self.name,
school_type = self.school_type,
name_suggest = {'input': self.name } <--- # what goes in here?
)
obj.save(index="school_index")
return obj.to_dict(include_meta=True)
As per the ES docs, suggestions are indexed like any other field. So I could just put a few terms in the name_suggest = statement above in my code which will match the corresponding field, when searched. But my question is how to do that with a ton of records? I was guessing there would be a standard way for ES to automatically come up with a few terms that could be used as suggestions. For example: using each word in the phrase as a term. I could come up something like that on my own (by breaking each phrase into words) but it seems counter-intuitive to do that on my own since I'd guess there would already be a default way that the user could further tweak if needed. But couldn't find anything like that on SO/blogs/ES docs/elasticsearch-dsl docs after searching for quite sometime. (This post by Adam Wattis was very helpful in getting me started though). Will appreciate any pointers.
I think I figured it out (..phew)
In the indexing function, I need to use the following to enable to the prefix completion suggester:
name_suggest = self.name
instead of:
name_suggest = {'input': something.here }
which seems to be used for more custom cases.
Thanks to this video that helped!

Django haystack elasticsearch problems with autocomplete (and queries with Capital letters)

I've got a basic django haystack elasticsearch installation running, that seems to be working.. until I hit an autocomplete problem:
It doesn't return autocompletion just the full field. another problem is with data that has CAPS, that isn't normalized (such as usernames..)
MY installation:
django 1.6.4
haystack 2.1.0
elasticsearch 1.3.1
py-elasticsearch 0.6.1
class SocialProfileIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
username = indexes.CharField(model_attr='username')
first_name = indexes.CharField(model_attr='first_name')
last_name = indexes.CharField(model_attr='last_name')
# Auto-complete
username_auto = indexes.EdgeNgramField(model_attr='username')
first_name_auto = indexes.EdgeNgramField(model_attr='first_name')
last_name_auto = indexes.EdgeNgramField(model_attr='last_name')
def get_model(self):
return SocialProfile
def index_queryset(self, using=None):
return self.get_model().objects.all()
Were in the view I return:
results = SearchQuerySet().models(SocialProfile).autocomplete(username_auto=q)
so when indexing a SocialProfile:
username=alonisser
when q (the query) is 'alonisser' I get the correct reply, But when I try 'alon' or similiar I don't get any results.
When I access elasticsearch directly through py-elasticsearch (without haystack):
es = Elasticsearch('http://elasticsearch.url:9200')
es.search('username_auto:alon', index='haystack')
I do get the correct result, so the is stored there and the problem is probably doing something wrong with haystack..
Similiar but different problems is when the searched item has Caps :like 'Alonisser' so searching for 'alonisser' doesn't return any result, but searching for 'Alonisser' does.
What am I doing wrong? Thanks for the help..
I think you already got an answer in the haystack forums but just to bring it out here also.
One way to get rid of the Caps problem is to use a custom prepare method in your index class although my haystack somehow handles it by default :S.
def prepare_username_auto(self, obj):
return obj.username.lower()
This will convert all usernames to lowercase when you run 'update_index'. Then you can also turn you user inserted search term to lower also which should produce correct results.
To search for a part of the word you need to use:
results = SearchQuerySet().models(SocialProfile).autocomplete(username_auto__startswith=q)

Hierarchical cache in Django

What I want to do is to mark some values in the cache as related so I could delete them at once. For example when I insert a new entry to the database I want to delete everything in the cache which was based on the old values in database.
I could always use cache.clear() but it seems too brutal to me. Or I could store related values together in the dictionary and cache this dictionary. Or I could maintain some kind of index in an extra field in cache. But everything seems to complicated to me (eventually slow?).
What you think? Is there any existing solution? Or is my approach wrong? Thanks for answers.
Are you using the cache api? It sounds like it.
This post, which pointed me to these slides helped me create a nice generational caching system which let me create the hierarchy I wanted.
In short, you store a generation key (such as group) in your cache and incorporate the value stored into your key creation function so that you can invalidate a whole set of keys at once.
With this basic concept you could create highly complex hierarchies or just a simple group system.
For example:
class Cache(object):
def generate_cache_key(self, key, group=None):
"""
Generate a cache key relating them via an outside source (group)
Generates key such as 'group-1:KEY-your-key-here'
Note: consider this pseudo code and definitely incomplete code.
"""
key_fragments = [('key', key)]
if group:
key_fragments.append((group, cache.get(group, '1')))
combined_key = ":".join(['%s-%s' % (name, value) for name, value in key_fragments)
hashed_key = md5(combined_key).hexdigest()
return hashed_key
def increment_group(self, group):
"""
Invalidate an entire group
"""
cache.incr(group)
def set(self, key, value, group=None):
key = self.generate_cache_key(key, group)
cache.set(key, value)
def get(self, key, group=None):
key = self.generate_cache_key(key, group)
return cache.get(key)
# example
>>> cache = Cache()
>>> cache.set('key', 'value', 'somehow_related')
>>> cache.set('key2', 'value2', 'somehow_related')
>>> cache.increment_group('somehow_related')
>>> cache.get('key') # both invalidated
>>> cache.get('key2') # both invalidated
Caching a dict or something serialised (with JSON or the like) sounds good to me. The cache backends are key-value stores like memcache, they aren't hierarchical.