index analyzer vs query analyzer in haystack - elasticsearch? - django

Elasticsearch itself seems to support index-analyzer and query-analyzer,
but haystack's elasticsearch doesn't seem to differentiate them.
Am I Correct?
related question is,
Elasticsearch's DEFAULT_SETTING seems to have 'settings.analysis.anaylyzer' and 'index.analysys.anaylyzer'. (eg. http://www.wellfireinteractive.com/blog/custom-haystack-elasticsearch-backend/ has 'index') What's the difference between them?

With haystack, you want to set the mappings yourself.
I wrote about haystack as well earlier here: Django Haystack Distinct Value for Field
In the settings, you can define analyzers on a per field basis, they can be a default analyzer (which is what haystack defaults to and get's applied at both search and index time) a search time analyzer and a query time analyzer.
It's usually good practice to define both a search time analyzer and index time analyzer, even if they are the exact same.
Using snowball text analyses, you might want to apply this at both search and index time, but something like an autocomplete feature, you might not want that (which is what haystack does). You want the index analyzer to store (edge)ngrams and usually you want to apply a stricter search time analysis, like keyword.
You almost never want to let haystack define the mapping.
As for the second part, see here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html
Mid way down it says:
"Note you do not have to explicitly specify index section inside
settings section."
I just tried this myself as well, because I had never tested it.

Related

haystack query for solr to find items where attribute is unset or specified value

I'm trying to query solr through haystack for all objects that either does not have an attribute (it's Null) or the attribute is a specified value.
I can query solr directly with the snippet (brand:foo OR (*:* -brand:*)) and get what I want. But I can't find a way to formulate this or anything logically the same through haystack without really ugly hacks.
I did find this ugly hack:
SearchQuerySet().filter(brand=Raw('%s OR (*:* -brand:*)' % Clean('foo'))
But it chains really poorly with that OR in there without any parenthesis around it.
Ideally a solution using a pure filter would be best, but failing that a way to add a chainable filter using raw solr query language.
I'm using django-haystack 2.4.0
It's not a perfect match, but narrow helps me enough to let me do what I want
SearchQuerySet().narrow('(brand:%s OR (*:* -brand:*))' % Clean('foo'))

Django Haystack similarity search

I'm a Django newbie doing a primitive website. I installed haystack and Whoosh as its search engine cause it was the simplest thing to do. It works fine, but there is a problem and I don't know how to Google it. I have some categories on my site and I have indexed their names to search. So, when a user enters "Computing" it finds the computing category and links to it. But there is a problem. If a user enters "Comp" into search field, it doesn't find "Computing" at all. Is this something that can be configured and how?
EDIT:
What else have I tried? Installing haystack 2.0, following this tutorial, installing solr instead of whoosh, trying Ngram fields, rebuilding indexes 10 times, rewriting search_indexes.py. Everything. Doesn't work. If I type in Comp, it doesn't find Computing. Is there anything else I could do? I have noticed that in the tutorial above, everything works like a charm instantly.
When you do the usual:
SearchQuerySet().filter(title='Computing')
in Haystack 1.x, it filters on everything exactly matching 'Computing'.
You can change that behaviour by using Haystack's Field Lookups, for example, using 'contains' will filter on anything containing the given string (Computing, Utingcomp, Comp):
SearchQuerySet().filter(title__contains='Comp')
In Haystack 2.x, the default filter is 'contains', so it should behave as you would expect it to "out-of-the-box"
Check out the documentation on autocomplete. You need to setup your indices to support Ngram's, but this should be exactly what you need.
from haystack.query import SearchQuerySet
SearchQuerySet().autocomplete(content_auto='old')
# Result match things like 'goldfish', 'cuckold' & 'older'.
So, if I'm understanding, what you're looking for is the equivalent of 'LIKE' in SQL.
The problem is search engines that back Haystack aren't like an RDBMS.
The low level implementation of this filter will involve using wildcard characters but most of the Haystack backends don't support a leading wildcard, something required for an icontains/endswith filter. However, since most backends support trailing wildcards, Haystack 2.x includes a startswith filter. The only case this doesn't handle is searching for the end of a word, which doesn't look to be possible.
So, if you have indexed:
"Look at our great discounts in Computer section"
Then the following Haystack query DO match:
SearchQuerySet().filter(title__startswith='comp')
# match!
Notice the difference between Django vs. Haystack startswith filters. Django startswith will match at the beginning of the complete sentence (i.e. a CharField), but the Haystack one will match at the beginning of a token (i.e. each word in a complete sentence).
Hope it helps!

How do you access/configure summaries/snippets in Django Haystack

I'm working on getting django-haystack set up on my site, and am trying to have snippets in my search results roughly like so:
Title of result one about Wikis ...this special thing about wiki values is that...I always use a wiki when I walk...snippet value three talks about wikis too...and here's another snippet value
about wikis.
I know there's a template tag that uses Haystack code to do the the highlighting, but the snippets it generates are pretty limited:
they always start with the query word
there's only one snippet value
they don't support asterisk queries
and other stuff?
Is there a way to use the Solr backend to generate proper snippets as shown above?
Bottom line is that the Solr highlighting can't really be used by Haystack in a flexible way. I spoke to the main developer for Haystack on IRC, and he said basically, if I want to have the kind of highlighting I'm looking for, the only way to get it is to extend the Solr backend that Haystack uses.
I dabbled in that for about half a day, but couldn't get Haystack to recognize my custom back end. Haystack has some magic backend loading code that just wasn't working with me.
Consequently, I've switched over to sunburnt, which provides a lighter-weight and more extensible wrapper around Solr. I'm hoping it will fare better.
from haystack.utils import Highlighter
my_text = 'This is a sample block that would be more meaningful in real life.'
my_query = 'block meaningful'
highlight = Highlighter(my_query)
highlight.highlight(my_text)
http://docs.haystacksearch.org/dev/highlighting.html

Solr: List results by distance

I'd like to pass some parameters to Solr that should afflict the weighting of the results (I do not want to filter away results that do not match these criterias).
E.g. I'd like to have a language attribute, and if i pass the user's language to the search engine I'd like to have the results matching the language listed first. As a newbie to Solr I'd like to know if and how this is possible!
Yes, that's possible by using boost functions. See this FAQ entry or the description of boost functions for the DisMaxQueryPlugin (the dismax query parser is the default parser).

Solr Query Syntax

I just got started looking at using Solr as my search web service. I don't know whether Solr supports these query types:
Startswith
Exact Match
Contain
Doesn't Contain
In the range
Could anyone guide me how to implement those features in Solr?
Cheers,
Samnang
Solr is capable of all those things but to adequately explain how to do each of time an answer would become a mini-manual for Solr.
I'd suggest you read the actual manual and tutorials linked from the Solr homepage.
In short though:
Startswith can be implemented using Lucene wildcards.
Exact matches will only be found if a field is not tokanized. I.e. the entire field is viewed as a single token.
Contain is the default search format. I.e. a search for "John" will find any document's whose search field contains the value "John". Prefixing with - (e.g. "-John" will only find documents that do not contain John).
Ranges (be they date or integer) are possible and quite powerful, example date:[* TO NOW] would find any document whose date is not in the future.