Django haystack with elasticsearch SearchQuerySet returns None - django

I have default django user model which i want to index using elasticsearch
I'm using django-haystack.
in settings.py
HAYSTACK_SEARCH_RESULTS_PER_PAGE = 12
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
in search_indexes.py
import datetime
from haystack import indexes
from django.contrib.auth.models import User
class UserIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True)
first_name = indexes.CharField(model_attr='first_name', null=True)
last_name = indexes.CharField(model_attr='last_name', null=True)
def get_model(self):
return User
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects.all()
and build the index using python manage.py rebuild_index
now in shell
from haystack.query import SearchQuerySet
SearchQuerySet().all()
it returns all the indexed objects (I can confirm the count is same as number of entries in db)
when I do
SearchQuerySet().filter(first_name='Wendy') It returns two results object which is again as expected.
but when I try SearchQuerySet().filter(content='Wendy') it returns None.
basically I want to create an API, in which we can pass a query param and return all the user objects that contains this query string in any field.
http://localhost/search/?q=Wendy
This is my first time I'm using Elasticsearch or (anysearch engine with haystack) so I'm not able to understand what is going on.
After little bit search I found few threads on stack overflow which suggests to use Ngram or EdgeNgram but again those also couldn't work.(I rebuilded the whole index). I even tried content_auto in filter but no success.
Any help or lead will be appreciated.
I was following this official docs.
http://django-haystack.readthedocs.org/en/latest/searchqueryset_api.html#quick-start
PS: I wrote here only two fields (firstname, lastname) but there are couple more fields in my actual code. its just to write here.
PPS: I'm using Django 1.9. could that be an issue?
This is how my view looks like
def search_api(request):
query = request.GET.get('q')
sqs = SearchQuerySet().filter(content=query)
data = map(lambda x: x.get_stored_fields(), sqs)
return HttpResponse(json.dumps(data))

Related

Why in django-import-export doesn't work use_bulk?

I use django-import-export 2.8.0 with Oracle 12c.
Line-by-line import via import_data() works without problems, but when I turn on the use_bulk=True option, it stops importing and does not throw any errors.
Why does not it work?
resources.py
class ClientsResources(resources.ModelResource):
class Meta:
model = Clients
fields = ('id', 'name', 'surname', 'age', 'is_active')
batch_size = 1000
use_bulk = True
raise_errors = True
views.py
def import_data(request):
if request.method == 'POST':
file_format = request.POST['file-format']
new_employees = request.FILES['importData']
clients_resource = ClientsResources()
dataset = Dataset()
imported_data = dataset.load(new_employees.read().decode('utf-8'), format=file_format)
result = clients_resource.import_data(imported_data, dry_run=True, raise_errors=True)
if not result.has_errors():
clients_resource.import_data(imported_data, dry_run=False)
return HttpResponseRedirect(request.META.get('HTTP_REFERER'))
data.csv
id,name,surname,age,is_active
18,XSXQAMA,BEHKZFI,89,Y
19,DYKNLVE,ZVYDVCX,20,Y
20,GPYXUQE,BCSRUSA,73,Y
21,EFHOGJJ,MXTWVST,93,Y
22,OGRCEEQ,KJZVQEG,52,Y
--UPD--
I used django-debug-toolbar and saw a very strange behavior with import-queries.
With Admin Panel doesnt work. I see all importing rows, but next it writes "Import finished, with 5 new and 0 updated clients.", and see this strange queries
Then I use import by my form and here simultaneous situation:
use_bulk by django-import-export (more)
And for comparing my handle create_bulk()
--UPD2--
I've tried to trail import logic and look what I found:
import_export/resources.py
def bulk_create(self, using_transactions, dry_run, raise_errors, batch_size=None):
"""
Creates objects by calling ``bulk_create``.
"""
print(self.create_instances)
try:
if len(self.create_instances) > 0:
if not using_transactions and dry_run:
pass
else:
self._meta.model.objects.bulk_create(self.create_instances, batch_size=batch_size)
except Exception as e:
logger.exception(e)
if raise_errors:
raise e
finally:
self.create_instances.clear()
This print() showed empty list in value.
This issue appears to be due to a bug in the 2.x version of django-import-export. It is fixed in v3.
The bug is present when running in bulk mode (use_bulk=True)
The logic in save_instance() is finding that 'new' instances have pk values set, and are then incorrectly treating them as updates, not creates.
I cannot determine how this would happen. It's possible this is related to using Oracle (though I cannot see how).

Django filter geometry given a coordinate

I want to get a row from a postgis table given a coordinate/point. With raw sql I do it with:
SELECT * FROM parcelas
WHERE fk_area=152
AND ST_contains(geometry,ST_SetSRID(ST_Point(342884.86705619487, 6539464.45201204),32721));
The query before returns one row.
When I try to do this on django it doesn't return me any row:
from django.contrib.gis.geos import GEOSGeometry
class TestView(APIView):
def get(self, request, format=None):
pnt = GEOSGeometry('POINT(342884.86705619487 6539464.45201204)', srid=32721)
parcelas = Parcelas.objects.filter(fk_area=152,geometry__contains=pnt)
#Also tried this
#parcelas = Parcelas.objects.filter(fk_area=pk,geometry__contains='SRID=32721;POINT(342884.86705619487 6539464.45201204)')
serializer = ParcelasSerializer(parcelas, many=True)
return Response(serializer.data)
Even with django raw query it fails although in this case it returns me an internal server error (argument 3: class 'TypeError': wrong type):
class TestView(APIView):
def get(self, request, format=None):
parcelas = Parcelas.objects.raw('SELECT * FROM parcelas WHERE fk_area=152 AND ST_contains(geometry,ST_SetSRID(ST_Point(342884.86705619487, 6539464.45201204),32721))')
for p in parcelas:
#Internal server error
print(p.id)
return Response('Test')
My model parcelas look like this:
from django.contrib.gis.db import models
class Parcelas(models.Model):
id = models.BigAutoField(primary_key=True)
fk_area = models.ForeignKey(Areas, models.DO_NOTHING, db_column='fk_area')
geometry = models.GeometryField()
class Meta:
managed = False
db_table = 'parcelas'
I don't know what I'm doing wrongly if someone has any idea.
EDIT:
If I print the raw query that django made:
SELECT "parcelas"."id", "parcelas"."fk_area", "parcelas"."geometry"::bytea FROM "parcelas" WHERE ("parcelas"."fk_area" = 152 AND ST_Contains("parcelas"."geometry", ST_Transform(ST_GeomFromEWKB('\001\001\000\000 \321\177\000\000C\224\335w\223\355\024A\350\303\355\0342\362XA'::bytea), 4326)))
Seems like django is not converting it to the correct srid (32721) but I don't know why
EDIT 2:
If in my model I specify the SRID it works correctly:
class Parcelas(models.Model):
geometry = models.GeometryField(srid=32721)
The problem is that the SRID can be variable depending on the query the rows have one SRID or another so I don't want to set it to always being one.
Test database is created separately and does not contain the same data as the main application database. Try using pdb and listing all entries inside the parcelas table. Unless TestView means just a mock view for the time being.
Using pdb:
import pdb, pdb.set_trace()
Parcelas.objects.all()
In case the records geometry needs to be compared to a geojson like object one approach is to convert the object to GEOSGeometry and then find the record using .get(), .filter() etc.
For example, in case of an API JSON request payload that contains somewhere in the payload the following field:
"geometry": {
"type": "Polygon",
"coordinates": [
[
[21.870314, 39.390873],
[21.871913, 39.39319],
[21.874029, 39.392443],
[21.873401, 39.391328],
[21.873369, 39.391272],
[21.873314, 39.391171],
[21.872715, 39.390024],
[21.870314, 39.390873]
]
]
}
One can use the following code:
import json
from django.contrib.gis.geos import GEOSGeometry
# Assuming the python dictionary containing the geometry field is geometry_dict
payload_geometry = GEOSGeometry(json.dumps(geometry_dict))
parcel = Parcel.objects.get(geometry=payload_geometry)

get() in Google Datastore doesn't work as intended

I'm building a basic blog from the Web Development course by Steve Hoffman on Udacity. This is my code -
import os
import webapp2
import jinja2
from google.appengine.ext import db
template_dir = os.path.join(os.path.dirname(__file__), 'templates')
jinja_env = jinja2.Environment(loader = jinja2.FileSystemLoader(template_dir), autoescape = True)
def datetimeformat(value, format='%H:%M / %d-%m-%Y'):
return value.strftime(format)
jinja_env.filters['datetimeformat'] = datetimeformat
def render_str(template, **params):
t = jinja_env.get_template(template)
return t.render(params)
class Entries(db.Model):
title = db.StringProperty(required = True)
body = db.TextProperty(required = True)
created = db.DateTimeProperty(auto_now_add = True)
class MainPage(webapp2.RequestHandler):
def get(self):
entries = db.GqlQuery('select * from Entries order by created desc limit 10')
self.response.write(render_str('mainpage.html', entries=entries))
class NewPost(webapp2.RequestHandler):
def get(self):
self.response.write(render_str('newpost.html', error=""))
def post(self):
title = self.request.get('title')
body = self.request.get('body')
if title and body:
e = Entries(title=title, body=body)
length = db.GqlQuery('select * from Entries order by created desc').count()
e.put()
self.redirect('/newpost/' + str(length+1))
else:
self.response.write(render_str('newpost.html', error="Please type in a title and some content"))
class Permalink(webapp2.RequestHandler):
def get(self, id):
e = db.GqlQuery('select * from Entries order by created desc').get()
self.response.write(render_str('permalink.html', id=id, entry = e))
app = webapp2.WSGIApplication([('/', MainPage),
('/newpost', NewPost),
('/newpost/(\d+)', Permalink)
], debug=True)
In the class Permalink, I'm using the get() method on the query than returns all records in the descending order of creation. So, it should return the most recently added record. But when I try to add a new record, permalink.html (it's just a page with shows the title, the body and the date of creation of the new entry) shows the SECOND most recently added. For example, I already had three records, so when I added a fourth record, instead of showing the details of the fourth record, permalink.html showed me the details of the third record. Am I doing something wrong?
I don't think my question is a duplicate of this - Read delay in App Engine Datastore after put(). That question is about read delay of put(), while I'm using get(). The accepted answer also states that get() doesn't cause any delay.
This is because of eventual consistency used by default for GQL queries.
You need to read:
https://cloud.google.com/appengine/docs/python/datastore/data-consistency
https://cloud.google.com/appengine/docs/python/datastore/structuring_for_strong_consistency
https://cloud.google.com/datastore/docs/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/
search & read on SO and other source about strong & eventual consistency in Google Cloud Datastore.
You can specify read_policy=STRONG_CONSISTENCY for your query but it has associated costs that you should be aware of and take into account.

Django-haystack without attribute '_fields'

I'm making my blog on Django and I want to add site search based on django-haystack. I made a basic configuration of haystack, using official manuals, but when I want to test my search, I'm getting an error: 'Options' object has no attribute '_fields'
Here are some of my configs:
search_indexes.py
class PostIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
title = indexes.CharField(model_attr='title')
pub_date = indexes.DateTimeField(model_attr='date')
def get_model(self):
return Post
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects.all()
settings.py
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.simple_backend.SimpleEngine',
},
}
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
So this is my problem. Does anyone worked with smth similar? Thanks in advance!
You are hitting a bug in the simple backend which is fixed in git. There doesn't seem to be a release which contains this fix, though, so you can either upgrade to the development version:
pip install -e git+https://github.com/toastdriven/django-haystack.git#master#egg=django-haystack
Or use a different backend (elasticsearch, solr, ...)

Autocomplete with Django Haystack

I am having a difficult time getting autocomplete to work with haystack and Solr in a search form. Following the instructions here Auto-complete i was able to create my index in the following way.
class PersonIndex(indexes.RealTimeSearchIndex, indexes.Indexable):
text = CharField(document=True, use_template=True)
first_name = CharField(model_attr='first_name')
last_name = CharField(model_attr='last_name')
first_name_auto = indexes.EdgeNgramField(model_attr='first_name')
def index_queryset(self):
"""Used when the entire index for model is updated."""
return self.get_model().objects.all().order_by('first_name')
def get_model(self):
return Person
And with the way my URL route is set up, i dont have a view request that get directed to, the search method works.
url(r'^search/person/', search_view_factory(
view_class=SearchView,
template='index.html',
form_class=ModelSearchForm
), name='haystack_search'),
The instructions say that we can perform the query in this fashion
from haystack.query import SearchQuerySet
sqs = SearchQuerySet().filter(content_auto=request.GET.get('q', ''))
but where do we put this SearchQuerySet, i am not sure what to override, how to modify my url to route correctly. My search currently works out of the box this way but i want to try auto complete with EdgeNgramField ?
You'll need to define your own custom search form and tell it how to generate the SearchQuerySet it returns to the view, and then tell your search_view_factory to use that form instead of the ModelSearchForm.
Specify the way you want to generate the SearchQuerySet used by your view by overriding the ModelSearchForm search method:
from haystack.forms import ModelSearchForm
class AutocompleteModelSearchForm(ModelSearchForm):
def search(self):
if not self.is_valid():
return self.no_query_found()
if not self.cleaned_data.get('q')
return self.no_query_found()
sqs = self.searchqueryset.filter(first_name_auto=self.cleaned_data['q'])
if self.load_all
sqs = sqs.load_all()
return sqs
This will now perform a filter on the form's SearchQuerySet on the first_name_auto field rather than the auto_query that it would usually do on the text field (see haystack/forms.py to see what the original search function looks like).
You specify that you want to use this form in the argument list to your search_view_factory
from path.to.your.forms import AutocompleteModelSearchForm
url(r'^search/person/', search_view_factory(
view_class=SearchView,
template='index.html',
form_class=AutocompleteModelSearchForm
), name='haystack_search'),