Django psql full text search not matching un-stemmed word - django

I'm running Django 1.10.1 against Postgres 9.4. My staging server and dev environments have psql servers at version 9.4.9 and production is an RDS instance at 9.4.7.
It seems like my SearchVectorField is not storing the search configuration given in production, though it is in staging and dev, and it seems to be either a version thing (unlikely, given the version difference and that it also worked on 9.3 in staging/dev) or the fact that production is on RDS instead of local on the server.
I'm using a custom configuration for full-text search called unaccent, which looks like this:
Token | Dictionaries
-----------------+-----------------------
asciihword | english_stem
asciiword | english_stem
email | simple
file | simple
float | simple
host | simple
hword | unaccent,english_stem
hword_asciipart | english_stem
hword_numpart | simple
hword_part | unaccent,english_stem
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | unaccent,english_stem
Unaccent is installed in both environments, and works in both environments.
I'm storing the search data in a django.contrib.postgres.search.SearchVectorField on my Writer model:
class Writer(models.Model):
#...
search = SearchVectorField(blank=True)
That column is updated with the following search vector:
writer_search_vector = (SearchVector('first_name', 'last_name', 'display_name',
config='unaccent', weight='A') +
SearchVector('raw_search_data', config='unaccent', weight='B'))
by the following statement, which runs periodically:
Writer.objects.update(search=search_utils.writer_search_vector)
And, for some reason, the configuration is storing successfully on my staging server and in dev, but not in production. E.g., this code returns the same results in all environments:
In [3]: Writer.objects.annotate(searchy=SearchVector('last_name')).filter(searchy='kostenberger')
Out[3]: <QuerySet []>
In [4]: Writer.objects.annotate(searchy=SearchVector('last_name', config='unaccent')).filter(searchy='kostenberger')
Out[4]: <QuerySet [<Writer: Andreas J. Köstenberger>, <Writer: Margaret Elizabeth Köstenberger>]>
But in staging, I get the following correct result if I use the stored vector:
In [5]: Writer.objects.filter(search='kostenberger')
Out[5]: <QuerySet [<Writer: Andreas J. Köstenberger>, <Writer: Margaret Elizabeth Köstenberger>]>
while in production, against the RDS instance, I get the following, incorrect result:
In [5]: Writer.objects.filter(search='kostenberger')
Out[5]: <QuerySet []>
and yet, in production still, the unaccent works but the english_stem does not, in that it will match the stemmed version of the text (below), but not the original version (above):
In [6]: Writer.objects.filter(search='kostenberg')
Out[6]: <QuerySet [<Writer: Margaret Elizabeth Köstenberger>, <Writer: Andreas J. Köstenberger>]>
Note that the database tables for Writer in the two environments are identical for this test.
Any ideas why the stored vector isn't working in production with the correct config, while if I create the vector on the fly it will work?

On RDS Postgres, you aren't allowed to change the default_text_search_config parameter. So, you have to configure the text search with each query:
from django.contrib.postgres.search import SearchRank, SearchQuery
…
search_query = SearchQuery(value='kostenberger', config='unaccent')
Writer.objects.filter(search=search_query)

Related

Djangocms users don't have permission to add or edit plugins

I have a website, currently running with Django==1.8.6 and Django-CMS 3.0.x (running through upgrades at the moment).
My users can not edit any of the frontend plugins. At the moment I am sure that this is not only true for my custom made plugins, but for ones that come with Django-CMS as well. As a test I have made a new User with all rights and staff status (no superuser). But also this user can't edit or add plugins.
For my search I have found this: https://github.com/divio/djangocms-text-ckeditor/issues/78
I also tested the solution given there as I am using ckeditor, but I don't have an entry for text, so this:
sqlite3> select * from django_content_type where app_label = 'text';
id | name | app_label | model
----+------+-----------+-------
23 | text | text | text
For results to
sqlite3> select * from django_content_type where app_label = 'text';
sqlite3>
I tried to figure out how to debug permission errors. I have also looked through auth_permission, but everything seems to be alright. Is there anyway to debug the permission process in order to find whats preventing my users from using the frontend editing?
Update
My current installed packages:
Django==1.8.6
Django-Select2==4.3.1
Pillow==3.0.0
South==1.0.2
Unidecode==0.04.18
YURL==0.13
aldryn-apphooks-config==0.2.6
aldryn-boilerplates==0.7.3
aldryn-categories==1.0.1
aldryn-common==1.0.0
aldryn-newsblog==1.0.9
aldryn-people==1.1.2
aldryn-reversion==1.0.1
aldryn-translation-tools==0.2.1
argparse==1.4.0
backport-collections==0.1
cmsplugin-filer==1.0.0
dj-database-url==0.3.0
django-admin-sortable==1.8.4
django-appconf==1.0.1
django-appdata==0.1.4
django-autoslug==1.8.0
django-ckeditor-filebrowser-filer==0.1.1
django-classy-tags==0.6.2
django-cms==3.1.3
django-durationfield==0.5.2
django-easy-select2==1.3
django-filer==1.0.2
django-mptt==0.7.4
django-parler==1.5.1
django-phonenumber-field==0.7.2
django-polymorphic==0.7.2
django-reversion==1.8.7
django-sekizai==0.8.2
django-sortedm2m==1.3.2
django-taggit==0.17.3
django-treebeard==3.0
djangocms-admin-style==1.0.5
djangocms-column==1.5
djangocms-file==0.1
djangocms-flash==0.2.0
djangocms-googlemap==0.3
djangocms-inherit==0.1
djangocms-installer==0.7.9
djangocms-link==1.6.2
djangocms-picture==0.1
djangocms-style==1.5
djangocms-teaser==0.1
djangocms-text-ckeditor==2.7.0
djangocms-video==0.1
easy-thumbnails==2.2.1
gunicorn==19.4.3
html5lib==0.9999999
lxml==3.5.0
phonenumbers==7.1.1
python-dateutil==2.4.2
python-slugify==1.1.4
pytz==2015.7
simplejson==3.8.0
six==1.10.0
tzlocal==1.2
vobject==0.6.6
wheel==0.24.0
wsgiref==0.1.2
The answer is after some debugging the permissions.py of the cms, that my sitepermissions where not set properly in the database. Resetting thos in the backend solved the problem.

Django-extension runscript No (valid) module for script

I'm trying to create a script that will populate my model families with informations extracted from a text file.
This is my first post in StackOverflow, please be gentle, sorry if the question is not well expressed or not correctly formatted.
Django V 1.9 and running on Python 3.5
Django-extensions installed
This is my model: it's in an app called browse
from django.db import models
from django_extensions.db.models import TimeStampedModel
class families(TimeStampedModel):
rfam_acc = models.CharField(max_length=7)
rfam_id = models.CharField(max_length=40)
description = models.CharField(max_length=75)
author = models.CharField(max_length=50)
comment = models.CharField(max_length=500)
rfam_URL = models.URLField()
Here I have my script familiespopulate.py. Positioned in the PROJECT_ROOT/scripts directory.
import csv
from browse.models import families
file_path = "/Users/work/Desktop/StructuRNA/website/scripts/RFAMfamily12.1.txt"
def run(file_path):
listoflists = list(csv.reader(open(file_path, 'rb'), delimiter='\t'))
for row in listoflists:
families.objects.create(
rfam_acc=row[0],
rfam_id=row[1],
description=row[3],
author=row[4],
comment=row[9],
)
When from Terminal i run:
python manage.py runscript familiespopulate
it returns:
No (valid) module for script 'familiespopulate' found
Try running with a higher verbosity level like: -v2 or -v3
The problem must be in importing the model families, I'm new to django, and I cannot find any solution here on StackOverflow or anywhere else online.
This is why I ask for your help!
Do you know how the model should be imported?
Or... Am I doing something else wrong.
Important piece of information is that the script runs if I modify it to PRINT out the parameters, instead of creating an object in families.
For your information and curiosity I will also post here an extract of the textfile that I'm using.
RF00001 5S_rRNA 1302 5S ribosomal RNA Griffiths-Jones SR, Mifsud W, Gardner PP Szymanski et al, 5S ribosomal database, PMID:11752286 38.00 38.00 37.90 5S ribosomal RNA (5S rRNA) is a component of the large ribosomal subunit in both prokaryotes and eukaryotes. In eukaryotes, it is synthesised by RNA polymerase III (the other eukaryotic rRNAs are cleaved from a 45S precursor synthesised by RNA polymerase I). In Xenopus oocytes, it has been shown that fingers 4-7 of the nine-zinc finger transcription factor TFIIIA can bind to the central region of 5S RNA. Thus, in addition to positively regulating 5S rRNA transcription, TFIIIA also stabilises 5S rRNA until it is required for transcription. NULL cmbuild -F CM SEED cmcalibrate --mpi CM cmsearch --cpu 4 --verbose --nohmmonly -T 24.99 -Z 549862.597050 CM SEQDB 712 183439 0 0 Gene; rRNA; Published; PMID:11283358 7946 0 0.59496 -5.32219 1600000 213632 305 119 1 -3.78120 0.71822 2013-10-03 20:41:44 2016-04-21 23:07:03
This is the first line and the result of the extraction from the listoflists is :
RF00002
5_8S_rRNA
5.8S ribosomal RNA
Griffiths-Jones SR, Mifsud W
5.8S ribosomal RNA (5.8S rRNA) is a component of the large subunit of the eukaryotic ribosome. It is transcribed by RNA polymerase I as part of the 45S precursor that also contains 18S and 28S rRNA. Functionally, it is thought that 5.8S rRNA may be involved in ribosome translocation [2]. It is also known to form covalent linkage to the p53 tumour suppressor protein [3]. 5.8S rRNA is also found in archaea.
Try adding empty file __init__.py (double underscore) into your /scipts folder and run with:
python manage.py runscript scipts.familiespopulate
Apart from adding init.py you are not supposed to pass any parameters in the run method.
def run():
<your code goes here>
Thanks for the useful comments.
I modified my code in this way:
import csv
from browse.models import families
def run():
file_path = "/Users/work/Desktop/StructuRNA/website/scripts/RFAMfamily12.1.txt"
listoflists = list(csv.reader(open(file_path, 'r'),delimiter='\t'))
print(listoflists)
for row in listoflists:
families.objects.create(
rfam_acc=row[0],
rfam_id=row[1],
description=row[3],
author=row[4],
comment=row[9],
)
This is all. Now it worked smoothly.
I want to confirm to everyone that my file: familiespopulate.py was in the folder script with the file init.py
The problem seemed to be resolved when I put
file_path = "/Users/work/Desktop/StructuRNA/website/scripts/RFAMfamily12.1.txt"
Inside the run function, removing the parameter file_path from run(file_path).
Another modify to my code was the argument r inside open(file_path, 'r'), before it was open(file_path, 'rb') that should corrispond to read binary.
I was also getting exactly the same error, I tried all of the solution above but unfortunately did not worked for me. Then I realized my mistake, and I found it.
Inside the script file (which is inside the script/ folder) I used different name for the function, which should be named as 'run'. So, make sure you checked it as well, if you get this error.
Here you can read more about "runscript"

How do I store a DataFrame into a BigTable in Google DataLab?

I have a DataFrame df. I create a BigQuery table.
# Create the schema, using the convenience of basing it on example DataFrame
schema = bq.Schema.from_dataframe(df)
# Create the dataset
bq.DataSet('ids').create()
# Create the table
suri_table = bq.Table('ids.suri').create(schema = schema, overwrite = True)
project = gcp.Context.default().project_id
There is a Pandas function [to_gbq()][1] which I want to use to store the DataFrame.
df.to_gbq(df, 'ids.suri', project)
This returns a "Not found exception" although the table exists. I just created it in the code above. Could someone help me out what the problem really is?
NotFoundException: Invalid Table Name. Should be of the form
'datasetId.tableId'
If I do:
from pandas.io import gbq
df.to_gbq('ids.suri', project_id=projectid)
I get:
/usr/lib/python2.7/dist-packages/pkg_resources.pyc in resolve(self, requirements, env, installer, replace_conflicting)
637 # unfortunately, zc.buildout uses a str(err)
638 # to get the name of the distribution here..
--> 639 raise DistributionNotFound(req)
640 to_activate.append(dist)
641 if dist not in req:
DistributionNotFound: google-api-python-client
[1]: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.gbq.to_gbq.html
You are conflating the Cloud Datalab way with the gbq way. You should use one or the other. To do this from Cloud Datalab, once you have created the data, you can just use:
suri_table.insert_data(df)
There are a couple of options if you want to include the index, etc; see http://googlecloudplatform.github.io/datalab/gcp.bigquery.html#gcp.bigquery.Table.insert_data

haystack whoosh no results - rebuild_index shows Indexing [number] <django.utils.functional.__proxy__ object at [memory location] >

when I run ./manage.py rebuild_index I get the readout for example:
Indexing 4574 <django.utils.functional.__proxy__ object at at 0x1aab690> .
Having seen other users' readouts, this should show the name of the search index/model instead and I am wondering if this could be part of the explanation as to why I have been experiencing no search results on the website and no objects appear to be indexed when performing:
>>> from haystack.query import SearchQuerySet
>>> sqs = SearchQuerySet().all()
>>> sqs.count()
I did not initially have a
def _unicode_self():
return self.name
on the models I am indexing but then I added it and nothing seemed to change even after doing rebuild_index
This was GitHub pull request #746 for Django Haystack, which has now been merged.
I was seeing this same issue on my local (dev) setup. Updating solved the "functional proxy" placeholder issue for me.
I ran the following command:
pip install -e git+git://github.com/toastdriven/django-haystack.git#master#egg=django-haystack
You may need to tweak the command to suit your own needs and/or environment.

Saving data in django model don't actually saves

How should I understand this? And how this can happens?
Go into django shell:
>>> from customauth.models import Profile
>>> p = Profile.objects.get(user_id=1)
>>> p.status
u'34566'
>>> p.status = 'qwerty'
>>> p.status
'qwerty'
>>> p.save()
>>> p.status
'qwerty'
>>> p = Profile.objects.get(user_id=1)
>>> p.status
u'qwerty'
>>>
Exit and go into django shell again:
>>> from customauth.models import Profile
>>> p = Profile.objects.get(user_id=1)
>>> p.status
u'qwerty'
>>>
Everything seems OK. But go into dbshell now:
mysql> select user_id, status from customauth_profile where user_id=1;
+---------+--------+
| user_id | status |
+---------+--------+
| 1 | 34566 |
I was having the same problem, I'm using Django and Mongo... the data wasn't persisted after object.save()... Them, I used it to solve:
object.save_base(force_update=True)
Now is working for me, hope can help.
If the data persists after closing the shell, it is very unlikely that the data was not saved into the database. Can you check your settings.py and make sure you're saving to the correct mysql database ?
The shell session is a single database transaction. Connections outside that won't see the changes, because of transaction isolation. You'll need to commit the transaction - the easiest way would be to quit the shell and restart.
In a normal request, Django commits automatically at the end of the request, so this behaviour isn't an issue.
Problem was in caching module. Strange, but there is no caching middleware was plugged and cache doesn't used in saving methods.
Was used django.core.cache.backends.locmem.LocMemCache module, may be its broken or has some strange features, I don't know. Will try memcached module, it should (I wish) fix problem without swithing off cache on site.