django haystack how do I find substrings in words? - django

In my field the content is "example".
I want to find not only the exact word "example", I also want to find "examp". How can I do that? Are there any options. Can't find anything.

If you just want to search for objects starting with some string, then just look at Haystack SearchQuerySet API documentation. It resembles the Django QuerySet API, so it is possible to write:
SearchQuerySet().filter(content__startswith='examp')
SearchQuerySet().filter(content__contains='examp')
or whatever you want.
But there is also something deeper in this question. I don't think you really need to. Because of the way search engines works - when someone searches for e.q. 'monitoring' it gets stemmed (it is process of getting something similar to root of the word - so we will have f.e. 'monitor' from 'monitoring') and that will be searched for in fact. Also everything in search indexes gets stemmed, so searching for monitor will return results containing f.e. 'monitors', 'monitoring', 'monitorize' etc.

Related

Django-dsl-drf Exclude phrase query

I am working on integrating Elastic Search in my existing Django REST application. I am using the django-dsl-drf module provided in the link below:
https://django-elasticsearch-dsl-drf.readthedocs.io/
In their documentation 'exclude' query param is provided. But the query only when we provide the full field value.
search-url?exclude=<field-value
For eg: If I have a value 'Stackoverflow' in field 'name'. I'll have to provide query param a
?name__exclude=Stackoverflow to exclude records having 'Stackoverflow' as name in the result. I would like to implement a search in such a way that when I provide 'over', I need to exclude these records, similar to ?name__exclude=over
I checked the above tutorial, but I couldn't find it. Is there any work around so that I can exclude records, fields containing terms instead of providing full field value, which is also case-insensitive.
Thanks a lot.
Using the contains functional filter, you can target documents that have their name field value containing the characters over anywhere in their terms:
?name__contains=over
However, as far as I know, there is no way to negate that filter in django-dsl-drf. You can create an issue requesting that feature, though, because odds are high that you're not the only who needs that, since it's a pretty common way of searching.

Django Haystack similarity search

I'm a Django newbie doing a primitive website. I installed haystack and Whoosh as its search engine cause it was the simplest thing to do. It works fine, but there is a problem and I don't know how to Google it. I have some categories on my site and I have indexed their names to search. So, when a user enters "Computing" it finds the computing category and links to it. But there is a problem. If a user enters "Comp" into search field, it doesn't find "Computing" at all. Is this something that can be configured and how?
EDIT:
What else have I tried? Installing haystack 2.0, following this tutorial, installing solr instead of whoosh, trying Ngram fields, rebuilding indexes 10 times, rewriting search_indexes.py. Everything. Doesn't work. If I type in Comp, it doesn't find Computing. Is there anything else I could do? I have noticed that in the tutorial above, everything works like a charm instantly.
When you do the usual:
SearchQuerySet().filter(title='Computing')
in Haystack 1.x, it filters on everything exactly matching 'Computing'.
You can change that behaviour by using Haystack's Field Lookups, for example, using 'contains' will filter on anything containing the given string (Computing, Utingcomp, Comp):
SearchQuerySet().filter(title__contains='Comp')
In Haystack 2.x, the default filter is 'contains', so it should behave as you would expect it to "out-of-the-box"
Check out the documentation on autocomplete. You need to setup your indices to support Ngram's, but this should be exactly what you need.
from haystack.query import SearchQuerySet
SearchQuerySet().autocomplete(content_auto='old')
# Result match things like 'goldfish', 'cuckold' & 'older'.
So, if I'm understanding, what you're looking for is the equivalent of 'LIKE' in SQL.
The problem is search engines that back Haystack aren't like an RDBMS.
The low level implementation of this filter will involve using wildcard characters but most of the Haystack backends don't support a leading wildcard, something required for an icontains/endswith filter. However, since most backends support trailing wildcards, Haystack 2.x includes a startswith filter. The only case this doesn't handle is searching for the end of a word, which doesn't look to be possible.
So, if you have indexed:
"Look at our great discounts in Computer section"
Then the following Haystack query DO match:
SearchQuerySet().filter(title__startswith='comp')
# match!
Notice the difference between Django vs. Haystack startswith filters. Django startswith will match at the beginning of the complete sentence (i.e. a CharField), but the Haystack one will match at the beginning of a token (i.e. each word in a complete sentence).
Hope it helps!

Using django-haystack, how do I perform a search with only partial terms?

I've got a Haystack/xapian search index for django.contrib.auth.models.User. The template is simply
{{object.get_full_name}}
as I intend for a user to type in a name and be able to search for it.
My issue is this: if I search, say, Sri (my full first name) I come up with a result for the user object pertaining to my name. However, if I search Sri Ragh - that is, my full name, and part of my last name, I get no results.
How can I set Haystack up so that I can get the appropriate results for partial queries?
(I essentially want it to search *Sri Ragh*, but I don't know if wildcards would actually do the trick, or how to implement them).
This is my search query:
results = SearchQuerySet().filter(content='Sri Ragh')
I use to have a similar problem, as workaround or maybe a Fix you can change the query lookup
results = SearchQuerySet().filter(content__startswith='Sri Ragh')
The issue is that django-haystack doesn't implement all lingos from search engines. Of course you can do this.
results = SearchQuerySet().raw_search('READ THE SEARCH ENGINE QUERY SYNTAX FOR GET WILDCARD LOOKUPS')
As Django-haystack says, this is not portable.
You can use icontains or startswith.
Be careful with this one, if a query is for example 'r', this will bring you all 'Model' entities that have a 'r' in its content.
Model.objects.filter(content__icontains=query)
Model.objects.filter(content__startswith=query)
Look at the documentation

Match all characters in group except for first and last occurrence

Say I request
parent/child/child/page-name
in my browser. I want to extract the parent, children as well as page name. Here are the regular expressions I am currently using. There should be no limit as to how many children there are in the url request. For the time being, the page name will always be at the end and never be omitted.
^([\w-]{1,}){1} -> Match parent (returns 'parent')
(/(?:(?!/).)*[a-z]){1,}/ -> Match children (returns /child/child/)
[\w-]{1,}(?!.*[\w-]{1,}) -> Match page name (returns 'page-name')
The more I play with this, the more I feel how clunky this solution is. This is for a small CMS I am developing in ASP Classic (:(). It is sort of like the MVC routing paths. But instead of calling controllers and functions based on the URL request. I would be travelling down the hierarchy and finding the appropriate page in the database. The database is using the nested set model and is linked by a unique page name for each child.
I have tried using the split function to split with a / delimiter however I found I was nested so many split statements together it became very unreadable.
All said, I need an efficient way to parse out the parent, children as well as page name from a string. Could someone please provide an alternative solution?
To be honest, I'm not even sure if a regular expression is the best solution to my problem.
Thank you.
You could try using:
^([\w-]+)(/.*/)([\w-]+)$
And then access the three matching groups created using Match.SubMatches. See here for more details.
EDIT
Actually, assuming that you know that [\w-] is all that is used in the names of the parts, you can use ^([\w-]+)(.*)([\w-]+)$ instead and it will handle the no-child case fine by itself as well.

Solr Query Syntax

I just got started looking at using Solr as my search web service. I don't know whether Solr supports these query types:
Startswith
Exact Match
Contain
Doesn't Contain
In the range
Could anyone guide me how to implement those features in Solr?
Cheers,
Samnang
Solr is capable of all those things but to adequately explain how to do each of time an answer would become a mini-manual for Solr.
I'd suggest you read the actual manual and tutorials linked from the Solr homepage.
In short though:
Startswith can be implemented using Lucene wildcards.
Exact matches will only be found if a field is not tokanized. I.e. the entire field is viewed as a single token.
Contain is the default search format. I.e. a search for "John" will find any document's whose search field contains the value "John". Prefixing with - (e.g. "-John" will only find documents that do not contain John).
Ranges (be they date or integer) are possible and quite powerful, example date:[* TO NOW] would find any document whose date is not in the future.