I have been trying to get the Elasticsearch to work in a Django application. It has been a problem because of the mess of compatibility considerations this apparently involves. I follow the recommendations, but still get an error when I actually perform a search.
Here is what I have
Django==2.1.7
Django-Haystack==2.5.1
Elasticsearch(django)==1.7.0
Elasticsearch(Linux app)==5.0.1
There is also DjangoCMS==3.7 and aldryn-search=1.0.1, but I am not sure how relevant those are.
Here is the error I get when I submit a search query via the basic text form.
GET /videos/modelresult/_search?_source=true [status:400 request:0.001s]
Failed to query Elasticsearch using '(video)': TransportError(400, 'parsing_exception')
Traceback (most recent call last):
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/haystack/backends/elasticsearch_backend.py", line 524, in search
_source=True)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/client/__init__.py", line 527, in search
doc_type, '_search'), params=params, body=body)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request
self._raise_error(response.status, raw_data)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'parsing_exception')
Could someone tell me if this is an issue with compatibility or is there something else going on? How can I fix it?
The combination that appears to have worked for my setup is as follows. I believe the key was to drastically downgrade the Elasticsearch.
Elasticsearch=1.7.6 (with Java 8)
Django==2.1.7
Django-Haystack==2.8.1
elasticsearch==1.7.0
The two items below may or may not be relevant. I have not changed them.
DjangoCMS==3.7.0
aldryn-search==1.0.1
Related
I've been running my Glue Jobs on a schedule for a few months. Last night my Glue Job failed due to botocore.exceptions.NoCredentialsError: Unable to locate credentials after calling bucket.objects.filter(Prefix=productionDirectory):
I am under the impression this is a result of not having defined a credentials file, but AWS Glue has always pulled credentials without issue. I just re-ran my job and everything worked perfectly. For reference, I define my Glue Client via: glue = boto3.client('glue'). Has anyone ever experienced this before? Is this just an edge-case?
Full Logs:
Traceback (most recent call last):
File "/tmp/data-deployment", line 67, in <module>
for obj in bucket.objects.filter(Prefix=productionDirectory):
File "/home/spark/.local/lib/python3.7/site-packages/boto3/resources/collection.py", line 83, in __iter__
for page in self.pages():
File "/home/spark/.local/lib/python3.7/site-packages/boto3/resources/collection.py", line 166, in pages
for page in pages:
File "/home/spark/.local/lib/python3.7/site-packages/botocore/paginate.py", line 255, in __iter__
response = self._make_request(current_kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/paginate.py", line 332, in _make_request
return self._method(**current_kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 613, in _make_api_call
operation_model, request_dict, request_context)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 632, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 116, in create_request
operation_name=operation_model.name)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Edit/Update: This is a known bug. I've posted the mitigation strategy provided from AWS as an answer below.
Update: I reached out to AWS via Support and they responded. Apparently this is a known bug and issue. While they do not have a solution or ETA for solution, they do have a way to mitigate the issue. Information below:
Thank you for reporting your issue to us and product team is aware of this intermittent issue.
They are working on resolution however, I do not have an ETA.
To mitigate this issue, increase the timeout / attempts to meta service request in your code:
####START######
import os
####Increase meta service timeout and attempt########
os.environ['AWS_METADATA_SERVICE_NUM_ATTEMPTS'] ="5"
os.environ['AWS_METADATA_SERVICE_TIMEOUT'] ="30"
#####################END#################
I faced a similar issue with Glue, but not exactly the same.
We used external tables with SparkSQL and S3, and sometimes an Exception was raised out of nowhere, i.e. Table not found. The issue was never reproduced on testing and had least frequency. Since our jobs ran perfectly fine on retries, we enabled the retry mechanism to solve it.
It has something to do with the internal workings of Glue and its serverless environment.
Im inserting 5000 records at once into elastic search
Total Size of these records is: 33936 (I got this using sys.getsizeof())
Elastic Search version: 1.5.0
Python 2.7
Ubuntu
Here is the following error
Traceback (most recent call last):
File "run_indexing.py", line 67, in <module>
index_policy_content(datatable, source, policyids)
File "run_indexing.py", line 60, in index_policy_content
bulk(elasticsearch_instance, actions)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 148, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 107, in streaming_bulk
resp = client.bulk(bulk_actions, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 568, in bulk
params=params, body=self._bulk_body(body))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 259, in perform_request
body = body.encode('utf-8')
MemoryError
Please help me resolve the issue.
Thanks & Regards,
Afroze
If I had to guess, I'd say this memory error is happening within python as it loads and serializes its data. Try cutting way back on the batch sizes until you get something that works, and then binary search upward until it fails again. That should help you figure out a safe batch size to use.
(Other useful information you might want to include: amount of memory in the server you're running your python process on, amount of memory for your elasticsearch server node(s), amount of heap allocated to Java.)
I was using Apache Solr for quite some time and only recently started running into some severe issues with it. I'm using it with haystack and a django project. When I do it from manage.py shell i'm getting the below:
>>> from haystack.query import SearchQuerySet
>>> emps = SearchQuerySet().filter(django_ct='web.employer').filter(name__icontains='Mi')[:10]
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/haystack/query.py", line 241, in __getitem__
self._fill_cache(start, bound)
File "/usr/local/lib/python2.7/dist-packages/haystack/query.py", line 140, in _fill_cache
results = self.query.get_results(**kwargs)
File "/usr/local/lib/python2.7/dist-packages/haystack/backends/__init__.py", line 469, in get_results
self.run(**kwargs)
File "/usr/local/lib/python2.7/dist-packages/haystack/backends/solr_backend.py", line 501, in run
results = self.backend.search(final_query, **search_kwargs)
File "/usr/local/lib/python2.7/dist-packages/haystack/backends/__init__.py", line 47, in wrapper
return func(obj, query_string, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/haystack/backends/solr_backend.py", line 202, in search
raw_results = self.conn.search(query_string, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pysolr.py", line 578, in search
response = self._select(params)
File "/usr/local/lib/python2.7/dist-packages/pysolr.py", line 308, in _select
return self._send_request('get', path)
File "/usr/local/lib/python2.7/dist-packages/pysolr.py", line 293, in _send_request
error_message = self._extract_error(resp)
File "/usr/local/lib/python2.7/dist-packages/pysolr.py", line 372, in _extract_error
reason, full_html = self._scrape_response(resp.headers, resp.content)
File "/usr/local/lib/python2.7/dist-packages/pysolr.py", line 404, in _scrape_response
p_nodes = body_node.cssselect('p')
AttributeError: 'NoneType' object has no attribute 'cssselect'
I tried reinstalling haystack, lxml, cssselect, pysolr and still i'm getting these errors. Is there anything else I can try for this? Thanks for any help!
I also tried reading few other SO questions including this:
XML error object has no attribute 'cssselect'
Seems like the issue is with pysolr. You might find some help here.
I had the same issue persist even after bringing up pysolr and lxml to latest version.
Turned out it was because I was not using haystack generated schema which has a few additional fields compared to the default solr one.
You can confirm if this is the case by looking at your solr logs.
It is an issue with pysolr. It hasn't been fixed till 3.3.0.
The only alternative would be to override the pysolr code and make adjustments for when Solr returns a reponse status!=200.
You can check if the response has body attribute or not and make adjustments according to that.
I have a function that will read data from a website, process it, and then load it into MongoDB. When I run this without threading it works fine but as soon as I set up celery tasks that just call this one function I frequently get the following error: "OperationFailure: database error: unauthorized db:dbname lock type:-1"
It's somewhat odd because if I run the non-celery version on multiple terminals, I do not get this error at all.
I suspect it has something to do with there not being an open connection to Mongo although in my code I'm opening one up right before every Mongo call.
The exact exception is below:
Task twitter[a974bfcc-d6ca-4baf-b36f-cae9143ce2d9] raised exception: OperationFailure(u'database error: unauthorized db:data lock type:-1 client:68.193.49.9',)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/djangoblog/network/tasks.py", line 40, in twitter
n_twitter.GetTweetsTwitter(user)
File "/djangoblog/network/twitter.py", line 255, in GetTweetsTwitter
id = SaveTweet(user, network, tweet)
File "/djangoblog/network/twitter.py", line 150, in SaveTweet
if mmo.Moment.objects(user=user.id,source_id=id,network=network.id).count() == 0:
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 933, in count
return self._cursor.count(with_limit_and_skip=True)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 563, in _cursor
self._cursor_obj = self._collection.find(self._query,
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 493, in _collection
if self._collection_obj.name not in db.collection_names():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/database.py", line 361, in collection_names
names = [r["name"] for r in results]
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 703, in next
if len(self.__data) or self._refresh():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 666, in _refresh
self.__uuid_subtype))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 628, in __send_message self.__tz_aware)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/helpers.py", line 101, in _unpack_response error_object["$err"])
OperationFailure: database error: unauthorized db:data lock type:-1 client:68.193.49.9
Sorry for the formatting but if you look at the line that starts with mmo.Moment there's a connection being opened right before that's called.
Doing a bit of research it looks as if it has something to do with the way threading is handled in PyMongo - http://api.mongodb.org/python/1.5.1/faq.html#how-does-connection-pooling-work-in-pymongo - I may need to start closing the connections but I'd expect MongoEngine to be doing this..
This is likely due to the fact that you are not calling db.authenticate() when you start the new connection and are using auth on MongoDB.
Regarding the closing of threads, I would recommend making sure you are using connection pooling and letting the driver manage the pools (calling close() or similar manually can lead to a lot of pain).
For more info see the note in the pymongo documentation about using authenticate() in a multi-threaded environment.
I'm trying to set up OpenID authentication in Django, using django-authopenid.
The instructions are pretty good, but having followed them and made all the requisite changes in settings.py and added the required templates, my whole site is now showing a 500 error, having previously worked fine. The Apache logs show:
Exception occurred processing WSGI script '/usr/local/www/wsgi-scripts/myapp.wsgi'.
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/django/core/handlers/wsgi.py", line 241, in __call__
response = self.get_response(request)
File "/usr/local/lib/python2.6/dist-packages/django/core/handlers/base.py", line 73, in get_response
response = middleware_method(request)
File "/usr/local/lib/python2.6/dist-packages/django_authopenid-1.0.1-py2.6.egg/django_authopenid/middleware.py", line 36, in process_request
request.associated_openids = [rel.openid_url for rel in rels]
File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 93, in _result_iter
self._fill_cache()
File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 660, in _fill_cache
self._result_cache.append(self._iter.next())
File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 207, in iterator
for row in self.query.results_iter():
File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/query.py", line 287, in results_iter
for rows in self.execute_sql(MULTI):
File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/query.py", line 2345, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python2.6/dist-packages/django/db/backends/util.py", line 19, in execute
return self.cursor.execute(sql, params)
ProgrammingError: relation "django_authopenid_userassociation" does not exist
Looks like a SQL error (I'm not a django expert)?
It's possible I've put my templates in the wrong place, the instructions aren't very clear. I just added two new directories, registration and openauthid, in the main templates folder.
Bit baffled - can anyone help? Thanks!
It looks to me like you've not yet set up the required tables. Try running:
python manage.py syncdb
from the project directory.