OSM Nominatim search strange behaviour - geocoding

I have ran into some troubles using Open Street Map Nominatim search API. I am trying to search and geocode addresses, but for some queries, the results are quite strange.
For example, when I use query:
http://nominatim.openstreetmap.org/search?format=json&countrycodes=cz&limit=10&accept-language=cz&addressdetails=1&q=Jihlava
I get expected results - city Jihlava.
But when I use queries like (only part of name):
http://nominatim.openstreetmap.org/search?format=json&countrycodes=cz&limit=10&accept-language=cz&addressdetails=1&q=Jihl
or
http://nominatim.openstreetmap.org/search?format=json&countrycodes=cz&limit=10&accept-language=cz&addressdetails=1&q=Jihla
or
http://nominatim.openstreetmap.org/search?format=json&countrycodes=cz&limit=10&accept-language=cz&addressdetails=1&q=Jihlav
I get empty result list.
Is there anything wrong with my query?
Thanks.

That's expected behavior, for now. Nominatim has no auto-correction feature yet. Thus only partly matching queries aren't always handled correctly.
If you need auto-correction then please see if one of the other search engines for OSM fits your needs.

Related

Cloudsearch Fuzzy terms and phrases

I am trying to get my head around how fuzzy search works on AWS CloudSearch
I want to find "Star Wars" but in my search, I spell it
ster wers
The logic of my app will add fuzzy but it never returns Star Wars.
I have tried:
ster~1 wers~1
"ster wers"~2
"ster"~1 "wers"~1
What am I missing here?
The reason your query doesn't work is because of how CloudSearch stems. If your field is indexed with the Analysis Scheme set to English, then wars will be stored in its stemmed form as war.
Here's a little demo of how stemming is affecting your query.
Searching with the un-stemmed query ('ster wers'):
Searching with the un-stemmed query requires you to match wers to war, which is off by 2 chars and requires this query: q=ster~1+wers~2.
Searching with the stemmed query ('ster wer'):
Searching with the stemmed version means you're matching wer to war and you're only off by 1 char. Thus ster~1 wer~1 will get the desired result (ie it matches star wars).
How to fix:
The use case you described will work if you configure the Analysis Scheme for the field in question to not use any stemming.
To do this, log into the AWS Web Console and go to Analysis Schemes --> Add Analysis Scheme:
Then go to Indexing Options and configure your field to use your new no-stemming analysis scheme:
Submit your changes and re-index.
That will address your issue but of course you'll lose the benefits of stemming. You can't have your cake and eat it too.

django haystack how do I find substrings in words?

In my field the content is "example".
I want to find not only the exact word "example", I also want to find "examp". How can I do that? Are there any options. Can't find anything.
If you just want to search for objects starting with some string, then just look at Haystack SearchQuerySet API documentation. It resembles the Django QuerySet API, so it is possible to write:
SearchQuerySet().filter(content__startswith='examp')
SearchQuerySet().filter(content__contains='examp')
or whatever you want.
But there is also something deeper in this question. I don't think you really need to. Because of the way search engines works - when someone searches for e.q. 'monitoring' it gets stemmed (it is process of getting something similar to root of the word - so we will have f.e. 'monitor' from 'monitoring') and that will be searched for in fact. Also everything in search indexes gets stemmed, so searching for monitor will return results containing f.e. 'monitors', 'monitoring', 'monitorize' etc.

Inexact full-text search in PostgreSQL and Django

I'm new to PostgreSQL, and I'm not sure how to go about doing an inexact full-text search. Not that it matters too much, but I'm using Django. In other words, I'm looking for something like the following:
q = 'hello world'
queryset = Entry.objects.extra(
where=['body_tsv ## plainto_tsquery(%s)'],
params=[q])
for entry in queryset:
print entry.title
where I the list of entries should contain either exactly 'hello world', or something similar. The listings should then be ordered according to how far away their value is from the specified string. For instance, I would like the query to include entries containing "Hello World", "hEllo world", "helloworld", "hell world", etc., with some sort of ranking indicating how far away each item is from the perfect, unchanged query string.
How would you go about doing this?
Your best bet is to use Django raw querysets, I use it with MySQL to perform full text matching. If the data is all in the database and Postgres provides the matching capability then it makes sense to use it. Plus Postgres offers some really useful things in terms of stemming etc with full text queries.
Basically it lets you write the actual query you want yet returns models (as long as you are querying a model table obviously).
The advantage this gives you is that you can test the exact query you will be using first in Postgres, the documentation covers full text queries pretty well.
The main gotcha with raw querysets at the moment is they don't support count. So if you will be returning lots of data and have memory constraints on your application you might need to do something clever.
"Inexact" matching however isn't really part of the full text searching capabilities. Instead you want the postgres fuzzystrmatch contrib module. It's use is described here with indexes.
The best would be to use a search engine for this purpose. Django-haystack supports the integration of three different search engines.
In 2022, Django supports full text search with postgres. Full documentation here: https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/search/

Using django-haystack, how do I perform a search with only partial terms?

I've got a Haystack/xapian search index for django.contrib.auth.models.User. The template is simply
{{object.get_full_name}}
as I intend for a user to type in a name and be able to search for it.
My issue is this: if I search, say, Sri (my full first name) I come up with a result for the user object pertaining to my name. However, if I search Sri Ragh - that is, my full name, and part of my last name, I get no results.
How can I set Haystack up so that I can get the appropriate results for partial queries?
(I essentially want it to search *Sri Ragh*, but I don't know if wildcards would actually do the trick, or how to implement them).
This is my search query:
results = SearchQuerySet().filter(content='Sri Ragh')
I use to have a similar problem, as workaround or maybe a Fix you can change the query lookup
results = SearchQuerySet().filter(content__startswith='Sri Ragh')
The issue is that django-haystack doesn't implement all lingos from search engines. Of course you can do this.
results = SearchQuerySet().raw_search('READ THE SEARCH ENGINE QUERY SYNTAX FOR GET WILDCARD LOOKUPS')
As Django-haystack says, this is not portable.
You can use icontains or startswith.
Be careful with this one, if a query is for example 'r', this will bring you all 'Model' entities that have a 'r' in its content.
Model.objects.filter(content__icontains=query)
Model.objects.filter(content__startswith=query)
Look at the documentation

Solr Query Syntax

I just got started looking at using Solr as my search web service. I don't know whether Solr supports these query types:
Startswith
Exact Match
Contain
Doesn't Contain
In the range
Could anyone guide me how to implement those features in Solr?
Cheers,
Samnang
Solr is capable of all those things but to adequately explain how to do each of time an answer would become a mini-manual for Solr.
I'd suggest you read the actual manual and tutorials linked from the Solr homepage.
In short though:
Startswith can be implemented using Lucene wildcards.
Exact matches will only be found if a field is not tokanized. I.e. the entire field is viewed as a single token.
Contain is the default search format. I.e. a search for "John" will find any document's whose search field contains the value "John". Prefixing with - (e.g. "-John" will only find documents that do not contain John).
Ranges (be they date or integer) are possible and quite powerful, example date:[* TO NOW] would find any document whose date is not in the future.