fuzzy search in django postgresql without using Elasticsearch - django

I try to incorporate fuzzy serach function in a django project without using Elasticsearch.
1- I am using postgres, so I first tried levenshtein, but it did not work for my purpose.
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s')"
function = "levenshtein"
def __init__(self, expression, search_term, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
**extras
)
items = Product.objects.annotate(lev_dist=Levenshtein(F('sort_name'), searchterm)).filter(
lev_dist__lte=2
)
Search "glyoxl" did not pick up "4-Methylphenylglyoxal hydrate", because levenshtein considered "Methylphenylglyoxal" as a word and compared with my searchterm "glyoxl".
2. trigram_similar gave weird results and was slow
items = Product.objects.filter(sort_name__trigram_similar=searchterm)
"phnylglyoxal" did not pick up "4-Methylphenylglyoxal hydrate", but
picked up some other similar terms: "4-Hydroxyphenylglyoxal hydrate",
"2,4,6-Trimethylphenylglyoxal hydrate"
"glyoxl" did not pick up any of the above terms
3. python package, fuzzywuzzy seems can solve my problem, but I was not able to incorporate it into query function.
ratio= fuzz.partial_ratio('glyoxl', '4-Methylphenylglyoxal hydrate')
# ratio = 83
I tried to use fuzz.partial_ratio function in annotate, but it did not work.
items = Product.objects.annotate(ratio=fuzz.partial_ratio(searchterm, 'full_name')).filter(
ratio__gte=75
)
Here is the error message:
QuerySet.annotate() received non-expression(s): 12.
According to this stackoverflow post (1), annotate does not take regular python functions. The post also mentioned that from Django 2.1, one can subclass Func to generate a custom function. But it seems that Func can only take database functions such as levenshtein.
Any way to solve these problems? thanks!

Related

filter queryset for multiple models in Django

I'm implementing a search feature where I'm matching keys from the description. and also matching media if description and media type of ['mp4','mkv','mov','avi'] match so the condition is satisfied.
So I have tried many methods but didn't find an efficient way. to make it possible without for loop.
I want to use those together.
description and media type ['mp4','mkv','mov','avi']
postinlang_queryset = PostInLanguages.objects.filter(description__contains=search_name)
media_type_query_set = LanguageMedia.objects.filter(content_type__contains ['mp4','mkv','mov','avi'])
yes, it's possible without for loop.Just follow the following script:
postinlang_queryset = PostInLanguages.objects.filter(description__contains=search_name)
media_type_query_set = LanguageMedia.objects.filter(content_type__in=['mp4','mkv','mov','avi'])
N.B: content_type__in=['mp4','mkv','mov','avi'] If we pass an empty list then it will never throw exceptions but return empty queryset

Unable to add a date time value in Cassandra DB? Invalid syntax error?

Problem:
I'm currently trying to insert a date time object into my Cassandra database using the following code:
dt_str = '2016-09-01 12:00:00.00000'
dt_thing = datetime.datetime.strptime(dt_str, '%Y-%m-%d %H:%M:%S.%f')
def insert_record(session):
session.execute(
"""
INSERT INTO record_table (last_modified)
VALUES(dt_thing)
"""
)
However, I'm receiving the following error:
cassandra.protocol.SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 3:17 no viable alternative at input ')' (...record_table (last_modified) VALUES([dt_thing])...)">
Background Info
I'm relatively new to Cassandra and I'm not sure how to go about this. What I'm basically trying to do is add an existing date time value in my database since an earlier version of the code is looking for one but it does not exist yet (hence, why I'm manually adding one).
I'm using Python 2.7 and Cassandra 3.0.
Any input or how to go about this would be great!
I answered a similar question yesterday. But the general idea, is that you'll want to define a prepared statement. Then bind your dt_thing variable to it, like this:
dt_str = '2016-09-01 12:00:00.00000'
dt_thing = datetime.datetime.strptime(dt_str, '%Y-%m-%d %H:%M:%S.%f')
def insert_record(session):
preparedInsert = session.prepare(
"""
INSERT INTO record_table (last_modified)
VALUES (?)
"""
)
session.execute(preparedInsert,[dt_thing])
Also, I don't recommend using a timestamp as a lone PRIMARY KEY (which is the only model for which that INSERT would work).

ElasticSearch - bulk indexing for a completion suggester in python

I am trying to add a completion suggester to enable search-as-you-type for a search field in my Django app (using Elastic Search 5.2.x and elasticseach-dsl). After trying to figure this out for a long time, I am not able to figure yet how to bulk index the suggester. Here's my code:
class SchoolIndex(DocType):
name = Text()
school_type = Keyword()
name_suggest = Completion()
Bulk indexing as follows:
def bulk_indexing():
SchoolIndex.init(index="school_index")
es = Elasticsearch()
bulk(client=es, actions=(a.indexing() for a in models.School.objects.all().iterator()))
And have defined an indexing method in models.py:
def indexing(self):
obj = SchoolIndex(
meta = {'id': self.pk},
name = self.name,
school_type = self.school_type,
name_suggest = {'input': self.name } <--- # what goes in here?
)
obj.save(index="school_index")
return obj.to_dict(include_meta=True)
As per the ES docs, suggestions are indexed like any other field. So I could just put a few terms in the name_suggest = statement above in my code which will match the corresponding field, when searched. But my question is how to do that with a ton of records? I was guessing there would be a standard way for ES to automatically come up with a few terms that could be used as suggestions. For example: using each word in the phrase as a term. I could come up something like that on my own (by breaking each phrase into words) but it seems counter-intuitive to do that on my own since I'd guess there would already be a default way that the user could further tweak if needed. But couldn't find anything like that on SO/blogs/ES docs/elasticsearch-dsl docs after searching for quite sometime. (This post by Adam Wattis was very helpful in getting me started though). Will appreciate any pointers.
I think I figured it out (..phew)
In the indexing function, I need to use the following to enable to the prefix completion suggester:
name_suggest = self.name
instead of:
name_suggest = {'input': something.here }
which seems to be used for more custom cases.
Thanks to this video that helped!

Using django haystack search with global search bar in template

I have a django project that needs to search 2 different models and one of the models has 3 types that I need to filter based on. I have haystack installed and working in a basic sense (using the default url conf and SearchView for my model and the template from the getting started documentation is returning results fine).
The problem is that I'm only able to get results by using the search form in the basic search.html template and I'm trying to make a global search bar work with haystack but I can't seem to get it right and I'm not having a lot of luck with the haystack documentation. I found another question on here that led me to the following method in my search app.
my urls.py directs "/search" to this view in my search.views:
def search_posts(request):
post_type = str(request.GET.get('type')).lower()
sqs = SearchQuerySet().all().filter(type=post_type)
view = search_view_factory(
view_class=SearchView,
template='search/search.html',
searchqueryset=sqs,
form_class=HighlightedSearchForm
)
return view(request)
The url string that comes in looks something like:
http://example.com/search/?q=test&type=blog
This will get the query string from my global search bar but returns no results, however if I remove the .filter(type=post_type) part from the sqs line I will get search results again (albeit not filtered by post type). Any ideas? I think I'm missing something fairly obvious but I can't seem to figure this out.
Thanks,
-Sean
EDIT:
It turns out that I am just an idiot. The reason why my filtering on the SQS by type was returning no results was because I didn't have the type field included in my PostIndex class. I changed my PostIndex to:
class PostIndex(indexes.SearchIndex, indexes.Indexable):
...
type = indexes.CharField(model_attr='type')
and rebuilt and it all works now.
Thanks for the response though!
def search_posts(request):
post_type = str(request.GET.get('type')).lower()
sqs = SearchQuerySet().filter(type=post_type)
clean_query = sqs.query.clean(post_type)
result = sqs.filter(content=clean_query)
view = search_view_factory(
view_class=SearchView,
template='search/search.html',
searchqueryset=result,
form_class=HighlightedSearchForm
)
return view(request)

email__iexact on django doesn't work on postgresql?

Calling UserModel.objects.filter(email__iexact=email) results in the following query
SELECT * FROM "accounts_person" WHERE "accounts_person"."email" = UPPER('my-email#mail.com')
This doesn't find anything because it there's no EMAIL#MAIL.COM in the database, only email#mail.com. Shouldn't the query have been translated to
WHERE UPPER("accounts_person"."email") = UPPER('my-email#mail.com')?
Summary:
UserModel.objects.filter(email=email) # works
UserModel.objects.filter(email__exact=email) # works
UserModel.objects.filter(email__iexact=email) # doesn't work
Clash you ae right this i also faced the same situtaion with postgres sql .
If you go through This ticket
You will get some idea .
Perhaps an option could be passed to EmailField to state whether you want it to lower all case or not. It would save having to do something in the form validation like.
def clean_email(self):
return self.cleaned_data['email'].lower()
My bad. I had patched lookup_cast to be able to use the unaccent module on postgresql and ended up not calling the original lookup_cast afterwards. The generated query now looks like this WHERE UPPER("accounts_person"."email"::text) = UPPER('my-email#mail.com'). This is the default behavior on django.