Wagtail search backend with Postgres - using unaccent with English search config

Wagtail search backend with Postgres - using unaccent with English search config - django

I'm not sure if this is a fault with Wagtail's search engine, or if I'm missing a setting/configuration step along the line.
I'm working with a couple of European sites where they work in English as a common language but there are plenty of people and place names in the content with extended latin characters.
On the db server, I added the unaccent extension and created a new search config based on the built-in english with unaccent added (using the example from postgres):
CREATE TEXT SEARCH CONFIGURATION english_extended ( COPY = english );
ALTER TEXT SEARCH CONFIGURATION english_extended
ALTER MAPPING FOR hword, hword_part, word
WITH unaccent, english_stem;
And backends:
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail.search.backends.database',
'SEARCH_CONFIG': 'english_extended',
},
'es': {
'BACKEND': 'wagtail.search.backends.database',
'SEARCH_CONFIG': 'spanish',
},
}
I have a page title with the word Bodø. To test, I tried a default search from the psql command line:
# select title from wagtailcore_page where to_tsvector(title) ## to_tsquery('Bodo');
title
-------
(0 rows)
No results, as expected. Then again using the new search config:
# select title from wagtailcore_page where to_tsvector('english_extended', title) ## to_tsquery('Bodo');
title
----------------------------------------------------
The old dock at Kjerringøy, Bodø, Nordland, Norway
(1 row)
Page now found, so the unaccent is being processed in the new config.
So far, so good. Unfortunately, in Wagtail, when I search, the unaccent doesn't happen, any word with an accented character seems to get ignored:
In [1]: from wagtail.search.backends import get_search_backend
In [2]: s=get_search_backend()
In [3]: s.config
Out[3]: 'english_extended'
In [4]: s.search('bodø', Page.objects.all())
Out[4]: <SearchResults []>
In [5]: s.search('bodo', Page.objects.all())
Out[5]: <SearchResults []>
In [6]: s.search('Bodø', Page.objects.all())
Out[6]: <SearchResults []>
In [7]: s.search('dock', Page.objects.all())
Out[7]: <SearchResults [<Page: The old dock at Kjerringøy, Bodø, Nordland, Norway>]>
If I use the Spanish backend for example, I can search unaccent terms without any config:
In [1]: from wagtail.search.backends import get_search_backend
In [2]: s=get_search_backend('en')
In [3]: s.search('jeremy' , Page.objects.all())
Out[3]: <SearchResults []>
In [4]: s.search('Jérémy' , Page.objects.all())
Out[4]: <SearchResults []>
In [5]: s=get_search_backend('es')
In [6]: s.config
Out[6]: 'spanish'
In [7]: s.search('jeremy' , Page.objects.all())
Out[7]: <SearchResults [<Page: Jérémy in the title>]>
Is there a step I'm missing or an additional property in the backend definition missing? Or is this bug/'undocumented feature' of Wagtail's search engine?

So the answer to this, if not the solution, is that Wagtail search doesn't support multiple wagtail_backends using the same backend with differing search configs.
All that happens is the index gets rebuilt for each one it encounters, leaving the index built with the last in the list. In the case above, it ended up indexed with 'spanish' which is why none of the unaccenting was happening.
The good news is, with the english_extended last in line, the unaccenting works perfectly, the page containing Bodø can be found with bodø or bodo.
Jérémy álphabét Bodø Strandå Hôtel cañon is indexed as 'alphabet':2B 'bodo':3B 'canon':6B 'hotel':5B 'jeremi':1B 'stranda':4B
We are going walking in the mountains is indexed as 'go':3B 'mountain':7B 'today':8B 'walk':4B
So unaccenting and stop/stem words are treated correctly.
The bad news is there seems to be no way to have a backend for (say) English and one for Spanish.
Issue raised on Wagtail github.

Related

How to do an equivalent of "overlap" keyword of Django 2.2 in Django 1.11?

Suppose I have a model "Books" which has a field named "locations_available". This field stores a list of locations in which a book is available. Now, I have a query_list = ['US', 'Germany', 'Italy'].
To find all the books which are available in any of these locations, I would do in Django 2.2 like this:
Books.objects.filter(locations_available__overlap=query_list)
Since Django 1.1 had no overlap feature, how would I do the same functionality there?
>>> Books.objects.create(name='X', locations=['India', 'Japan'])
>>> Books.objects.create(name='Y', locations=['US', 'Korea'])
>>> Books.objects.create(name='Z', locations=['Italy', 'Germany'])
>>> Books.objects.create(name='A', locations=['US', 'Germany', Italy])
Consider the above data, the following datas should be returned:
name='Y', locations=['US', 'Korea']
name='Z', locations=['Italy', 'Germany']
name='A', locations=['US', 'Germany', 'Italy']
Note that the book with name='X' is not returned as it has no overlapping with any of the locations in the query_list.
Check out this for more details: https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#overlap

Since Django 1.1 (Sic) had no overlap feature, how would I do the same functionality there?
django-1.10 has an __overlap lookup [Django-1.10-doc]. It even already has this feature since django-1.8, see for example the documentation [Django-1.8-doc], and the source code [GitHub].
The documentation clearly demonstrates how this works:
>>> Post.objects.create(name='First post', tags=['thoughts', 'django'])
>>> Post.objects.create(name='Second post', tags=['thoughts'])
>>> Post.objects.create(name='Third post', tags=['tutorial', 'django'])
>>> Post.objects.filter(tags__overlap=['thoughts'])
[<Post: First post>, <Post: Second post>]
>>> Post.objects.filter(tags__overlap=['thoughts', 'tutorial'])
[<Post: First post>, <Post: Second post>, <Post: Third post>]
It thus seems to suggest, that either your locations_available is not an ArrayField((models.CharField(...), ...), or your query_list is not a list of strings.

Using Python to retrieve website table data after filtering a specific date

I am trying to build a python script to retrieve historic wind power data from this site
I have done sort of a similar thing before. In that case the date and relevant parameters were entered explicitly in the url address.
As you can see in the previous link, e.g. the date is selected from a calendar and it is not displayed as part of the web address.
How can I use python to select an specific date and type in an Id for the fields Settlement Dateand NGC BM Unit Id respectively?
For example:
Settlement Date = 2017-08-01
NGC BM Unit Id = ANSUW-1
I don't have a MWE because I've no clue how to proceed. I was trying to reuse code from other script I'd used to get weather data:
from lxml import html
from lxml import etree
import urllib
def gettabledata():
web= urllib.urlopen("https://www.bmreports.com/bmrs/?q=actgenration/actualgeneration")
s = web.read()
html = etree.HTML(s)
but in this case it's no that simple since the filter parameters are not passed through the url.
Thanks.

I think the below script will fetch you the desired response:
import requests
payload = {"flowid":"b1610","start_date":"2017-08-01","period":"*","bmu_id":"ANSUW-1"}
headers={
"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36",
"X-Requested-With":"XMLHttpRequest"
}
page = requests.get("https://www.bmreports.com/bmrs/?", params=payload, headers=headers).text
print(page)

Yes, the advice I offered in the comment was genuinely awful. Shahin is right. What I would add is that you can get the result in json which is relatively easy to process. It has taken me this long to get to this point.
>>> import requests
>>> parameter={"flowid":"b1610","start_date":"2017-08-02","period":"*","bmu_id":"ANSUW-1"}
>>> arg = 'https://www.bmreports.com/bmrs/?q=tablegen&parameter=%s' % str(parameter).replace("'",'"').replace(' ','')
>>> r = requests.get(arg)
>>> r
<Response [200]>
The result in r is json which admittedly looks horrible. However, on inspection it proves to be a series of nested dictionaries. Eventually, if you burrow in you find that 'item' is a list of 48 dictionaries from which you can easily extract whatever you might want.
>>> r.json()['responseBody']['responseList']['item'][0]
{'quantity': '1.414', 'marketGenerationBMUId': 'T_ANSUW-1', 'timeSeriesID': 'ELX-EMFIP-AGOG-TS-14842', 'powerSystemResourceType': 'Generation', 'resolution': 'PT30M', 'documentRevNum': '1', 'bMUnitID': 'T_ANSUW-1', 'registeredResourceEICCode': '48W00000ANSUW-1E', 'businessType': 'Production', 'settlementPeriod': '48', 'curveType': 'Sequential fixed size block', 'marketGenerationUnitEICCode': '48W00000ANSUW-1E', 'activeFlag': 'Y', 'nGCBMUnitID': 'ANSUW-1', 'processType': 'Realised', 'documentID': 'ELX-EMFIP-AGOG-17134615', 'marketGenerationNGCBMUId': 'ANSUW-1', 'settlementDate': '2017-08-02', 'documentType': 'Actual generation'}
>>> r.json()['responseBody']['responseList']['item'][47]
{'quantity': '1.088', 'marketGenerationBMUId': 'T_ANSUW-1', 'timeSeriesID': 'ELX-EMFIP-AGOG-TS-172', 'powerSystemResourceType': 'Generation', 'resolution': 'PT30M', 'documentRevNum': '1', 'bMUnitID': 'T_ANSUW-1', 'registeredResourceEICCode': '48W00000ANSUW-1E', 'businessType': 'Production', 'settlementPeriod': '1', 'curveType': 'Sequential fixed size block', 'marketGenerationUnitEICCode': '48W00000ANSUW-1E', 'activeFlag': 'Y', 'nGCBMUnitID': 'ANSUW-1', 'processType': 'Realised', 'documentID': 'ELX-EMFIP-AGOG-17134615', 'marketGenerationNGCBMUId': 'ANSUW-1', 'settlementDate': '2017-08-02', 'documentType': 'Actual generation'}
You can set items to the 'item' dictionary and then go from there.
>>> items = r.json()['responseBody']['responseList']['item']
>>> items[0]['settlementPeriod']
'48'
>>> items[47]['quantity']
'1.088'
Addendum: In case you don't know how I was able to get that url this is it. I used the Chrome browser. I right-clicked on any element and then on 'Inspect'. Then I clicked on the 'Network' tab in the right-hand pane, then on 'XHR'. Now I clicked on the 'View' button. As you see in the small screen view below I could just about see '?q=tablegen' in the table. I right-clicked and copied that into an editor for study.

Does django-constance admin supports database backend?

I'm trying to setup the admin to show settings meant to be stored in database backend (Postgres 9.5.0). I manually created values in shell_plus as follows:
In [1]: from constance.backends.database.models import Constance
In [2]: first_record = Constance.objects.get(id=1)
In [3]: first_record
Out[3]:
pg-admin properly shows the entry although django admin doesn't show it at all. I ran migrate command for both databases (I have default and product databases) but the record still is not showing up. Certainly I can make it work with forcing to register with admin as follows:
admin.site.register(Constance)
but my question is if it's necessary?

Yes, they do.
You need to manage dependencies, but you can just use next command to install:
pip install "django-constance[database]"
Also you need to add some additionl settings to your settings.py :
CONSTANCE_BACKEND = 'constance.backends.database.DatabaseBackend'
INSTALLED_APPS = (
# other apps
'constance.backends.database',
)
#optional - in case you want specify table prefix
CONSTANCE_DATABASE_PREFIX = 'constance:myproject:'
Then you need to apply migrations by running command python manage.py migrate database
For displaying settings inputs in admin you should specify them in your settings.py. There are various types of fields and you even can add your own types of fields using CONSTANCE_ADDITIONAL_FIELDS parameter.
CONSTANCE_CONFIG = {
'THE_ANSWER': (42, 'Answer to the Ultimate Question of Life, '
'The Universe, and Everything'),
}
You can read more at documentation page.

django cache key belong to which views?

I have Django 1.4 cache enabled with Redis as backend.I would like to know which view belongs to my cache key?
:1:views.decorators.cache.cache_page.mysite.GET.077b0d695a2095e154185234de17ad3350.d669abb4a2a0575f43321342f66b.fr
I know it is a template:
In [2]: r = redis.StrictRedis(host='localhost', port=6379, db=1)
In [3]: dd = r.get('':1:views.decorators.cache.cache_page.mysite.GET.077b0d695a2095e154185234de17ad3350.d669abb4a2a0575f43321342f66b.fr'')
In [6]: obj = cPickle.loads(dd)
In [7]: obj
Out[7]: <django.template.response.TemplateResponse object at 0x2a47050>
Is there a way to render this template to see what's inside also?
I tried
obj.render()
print(obj.content)
but i got some strange characters.

You are probably using the Gzip middleware.
Either remove it or use the gzip module do unpack the content.

timezone support in django

I'm trying to get the various US timezones supported in Django. I'm able to get the America/New_York, etc. But my question is for places like Hawaii? I'm not able to find the proper timezone setting for it? Is there a place for all the available settings that we can use in Django? I want to build an application and give the user the ability to choose the timezone they are in.. Thanks for the help!

See "What you need to know about date/time" (nice video from PyCon 2012).
>>> import pytz
>>> pytz.all_timezones
['Africa/Abidjan',
'Africa/Accra',
'Africa/Addis_Ababa'
...
'US/Pacific-New',
'US/Samoa',
'UTC',
'Universal',
'W-SU',
'WET',
'Zulu']
Data available:
all_timezones = ['Africa/Abidjan', 'Africa/Accra', ...]
all_timezones_set = set(['Africa/Abidjan', 'Africa/Accra', ...]
common_timezones = ['Africa/Abidjan', 'Africa/Accra', ...]
common_timezones_set = set(['Africa/Abidjan', 'Africa/Accra', ...])
country_names = {u'BD': u'Bangladesh', u'BE': u'Belgium', ...}
country_timezones = {u'BD': [u'Asia/Dhaka'], u'BE': [u'Europe/Brussels'] ...}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Wagtail search backend with Postgres - using unaccent with English search config - django

Related

How to do an equivalent of "overlap" keyword of Django 2.2 in Django 1.11?

Using Python to retrieve website table data after filtering a specific date

Does django-constance admin supports database backend?

django cache key belong to which views?

timezone support in django

Categories

Resources