Django Haystack similarity search - django

I'm a Django newbie doing a primitive website. I installed haystack and Whoosh as its search engine cause it was the simplest thing to do. It works fine, but there is a problem and I don't know how to Google it. I have some categories on my site and I have indexed their names to search. So, when a user enters "Computing" it finds the computing category and links to it. But there is a problem. If a user enters "Comp" into search field, it doesn't find "Computing" at all. Is this something that can be configured and how?
EDIT:
What else have I tried? Installing haystack 2.0, following this tutorial, installing solr instead of whoosh, trying Ngram fields, rebuilding indexes 10 times, rewriting search_indexes.py. Everything. Doesn't work. If I type in Comp, it doesn't find Computing. Is there anything else I could do? I have noticed that in the tutorial above, everything works like a charm instantly.

When you do the usual:
SearchQuerySet().filter(title='Computing')
in Haystack 1.x, it filters on everything exactly matching 'Computing'.
You can change that behaviour by using Haystack's Field Lookups, for example, using 'contains' will filter on anything containing the given string (Computing, Utingcomp, Comp):
SearchQuerySet().filter(title__contains='Comp')
In Haystack 2.x, the default filter is 'contains', so it should behave as you would expect it to "out-of-the-box"

Check out the documentation on autocomplete. You need to setup your indices to support Ngram's, but this should be exactly what you need.
from haystack.query import SearchQuerySet
SearchQuerySet().autocomplete(content_auto='old')
# Result match things like 'goldfish', 'cuckold' & 'older'.

So, if I'm understanding, what you're looking for is the equivalent of 'LIKE' in SQL.
The problem is search engines that back Haystack aren't like an RDBMS.
The low level implementation of this filter will involve using wildcard characters but most of the Haystack backends don't support a leading wildcard, something required for an icontains/endswith filter. However, since most backends support trailing wildcards, Haystack 2.x includes a startswith filter. The only case this doesn't handle is searching for the end of a word, which doesn't look to be possible.
So, if you have indexed:
"Look at our great discounts in Computer section"
Then the following Haystack query DO match:
SearchQuerySet().filter(title__startswith='comp')
# match!
Notice the difference between Django vs. Haystack startswith filters. Django startswith will match at the beginning of the complete sentence (i.e. a CharField), but the Haystack one will match at the beginning of a token (i.e. each word in a complete sentence).
Hope it helps!

Related

django haystack 2.0 contains query

Even though doc says
The contains filter became the new default filter as of Haystack v2.X (the default in Haystack v1.X was exact). This changed because exact caused problems and was unintuitive for new people trying to use Haystack. contains is a much more natural usage. http://django-haystack.readthedocs.org/en/latest/searchqueryset_api.html
I find my Solr returns exact match.
I tried subclassing SearchForm and overriding search function to use
filter(content__contains=...) as suggested by django haystack how do I find substrings in words?
But it doesn't work.
How do I do the contains search?
Using Ngram suggested in Django-Haystack with Solr contains search is the only way to do it?
I'm using solr 3.6 if that matters.

django haystack how do I find substrings in words?

In my field the content is "example".
I want to find not only the exact word "example", I also want to find "examp". How can I do that? Are there any options. Can't find anything.
If you just want to search for objects starting with some string, then just look at Haystack SearchQuerySet API documentation. It resembles the Django QuerySet API, so it is possible to write:
SearchQuerySet().filter(content__startswith='examp')
SearchQuerySet().filter(content__contains='examp')
or whatever you want.
But there is also something deeper in this question. I don't think you really need to. Because of the way search engines works - when someone searches for e.q. 'monitoring' it gets stemmed (it is process of getting something similar to root of the word - so we will have f.e. 'monitor' from 'monitoring') and that will be searched for in fact. Also everything in search indexes gets stemmed, so searching for monitor will return results containing f.e. 'monitors', 'monitoring', 'monitorize' etc.

How do you access/configure summaries/snippets in Django Haystack

I'm working on getting django-haystack set up on my site, and am trying to have snippets in my search results roughly like so:
Title of result one about Wikis ...this special thing about wiki values is that...I always use a wiki when I walk...snippet value three talks about wikis too...and here's another snippet value
about wikis.
I know there's a template tag that uses Haystack code to do the the highlighting, but the snippets it generates are pretty limited:
they always start with the query word
there's only one snippet value
they don't support asterisk queries
and other stuff?
Is there a way to use the Solr backend to generate proper snippets as shown above?
Bottom line is that the Solr highlighting can't really be used by Haystack in a flexible way. I spoke to the main developer for Haystack on IRC, and he said basically, if I want to have the kind of highlighting I'm looking for, the only way to get it is to extend the Solr backend that Haystack uses.
I dabbled in that for about half a day, but couldn't get Haystack to recognize my custom back end. Haystack has some magic backend loading code that just wasn't working with me.
Consequently, I've switched over to sunburnt, which provides a lighter-weight and more extensible wrapper around Solr. I'm hoping it will fare better.
from haystack.utils import Highlighter
my_text = 'This is a sample block that would be more meaningful in real life.'
my_query = 'block meaningful'
highlight = Highlighter(my_query)
highlight.highlight(my_text)
http://docs.haystacksearch.org/dev/highlighting.html

Inexact full-text search in PostgreSQL and Django

I'm new to PostgreSQL, and I'm not sure how to go about doing an inexact full-text search. Not that it matters too much, but I'm using Django. In other words, I'm looking for something like the following:
q = 'hello world'
queryset = Entry.objects.extra(
where=['body_tsv ## plainto_tsquery(%s)'],
params=[q])
for entry in queryset:
print entry.title
where I the list of entries should contain either exactly 'hello world', or something similar. The listings should then be ordered according to how far away their value is from the specified string. For instance, I would like the query to include entries containing "Hello World", "hEllo world", "helloworld", "hell world", etc., with some sort of ranking indicating how far away each item is from the perfect, unchanged query string.
How would you go about doing this?
Your best bet is to use Django raw querysets, I use it with MySQL to perform full text matching. If the data is all in the database and Postgres provides the matching capability then it makes sense to use it. Plus Postgres offers some really useful things in terms of stemming etc with full text queries.
Basically it lets you write the actual query you want yet returns models (as long as you are querying a model table obviously).
The advantage this gives you is that you can test the exact query you will be using first in Postgres, the documentation covers full text queries pretty well.
The main gotcha with raw querysets at the moment is they don't support count. So if you will be returning lots of data and have memory constraints on your application you might need to do something clever.
"Inexact" matching however isn't really part of the full text searching capabilities. Instead you want the postgres fuzzystrmatch contrib module. It's use is described here with indexes.
The best would be to use a search engine for this purpose. Django-haystack supports the integration of three different search engines.
In 2022, Django supports full text search with postgres. Full documentation here: https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/search/

Solr Query Syntax

I just got started looking at using Solr as my search web service. I don't know whether Solr supports these query types:
Startswith
Exact Match
Contain
Doesn't Contain
In the range
Could anyone guide me how to implement those features in Solr?
Cheers,
Samnang
Solr is capable of all those things but to adequately explain how to do each of time an answer would become a mini-manual for Solr.
I'd suggest you read the actual manual and tutorials linked from the Solr homepage.
In short though:
Startswith can be implemented using Lucene wildcards.
Exact matches will only be found if a field is not tokanized. I.e. the entire field is viewed as a single token.
Contain is the default search format. I.e. a search for "John" will find any document's whose search field contains the value "John". Prefixing with - (e.g. "-John" will only find documents that do not contain John).
Ranges (be they date or integer) are possible and quite powerful, example date:[* TO NOW] would find any document whose date is not in the future.