Elastic search memoryerror on bulk insert

Elastic search memoryerror on bulk insert - python-2.7

Im inserting 5000 records at once into elastic search
Total Size of these records is: 33936 (I got this using sys.getsizeof())
Elastic Search version: 1.5.0
Python 2.7
Ubuntu
Here is the following error
Traceback (most recent call last):
File "run_indexing.py", line 67, in <module>
index_policy_content(datatable, source, policyids)
File "run_indexing.py", line 60, in index_policy_content
bulk(elasticsearch_instance, actions)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 148, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 107, in streaming_bulk
resp = client.bulk(bulk_actions, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 568, in bulk
params=params, body=self._bulk_body(body))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 259, in perform_request
body = body.encode('utf-8')
MemoryError
Please help me resolve the issue.
Thanks & Regards,
Afroze

If I had to guess, I'd say this memory error is happening within python as it loads and serializes its data. Try cutting way back on the batch sizes until you get something that works, and then binary search upward until it fails again. That should help you figure out a safe batch size to use.
(Other useful information you might want to include: amount of memory in the server you're running your python process on, amount of memory for your elasticsearch server node(s), amount of heap allocated to Java.)

Related

Failed to query Elasticsearch using : TransportErrorr(400, 'parsing_exception')

I have been trying to get the Elasticsearch to work in a Django application. It has been a problem because of the mess of compatibility considerations this apparently involves. I follow the recommendations, but still get an error when I actually perform a search.
Here is what I have
Django==2.1.7
Django-Haystack==2.5.1
Elasticsearch(django)==1.7.0
Elasticsearch(Linux app)==5.0.1
There is also DjangoCMS==3.7 and aldryn-search=1.0.1, but I am not sure how relevant those are.
Here is the error I get when I submit a search query via the basic text form.
GET /videos/modelresult/_search?_source=true [status:400 request:0.001s]
Failed to query Elasticsearch using '(video)': TransportError(400, 'parsing_exception')
Traceback (most recent call last):
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/haystack/backends/elasticsearch_backend.py", line 524, in search
_source=True)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/client/__init__.py", line 527, in search
doc_type, '_search'), params=params, body=body)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request
self._raise_error(response.status, raw_data)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'parsing_exception')
Could someone tell me if this is an issue with compatibility or is there something else going on? How can I fix it?

The combination that appears to have worked for my setup is as follows. I believe the key was to drastically downgrade the Elasticsearch.
Elasticsearch=1.7.6 (with Java 8)
Django==2.1.7
Django-Haystack==2.8.1
elasticsearch==1.7.0
The two items below may or may not be relevant. I have not changed them.
DjangoCMS==3.7.0
aldryn-search==1.0.1

How to fix 'MemoryError' while fitting keras model with fit_generator?

I'm training a unet on a google cloud tensorflow vm instance. When I run fit_generator, I get a MemoryError.
When I run the same code locally on tensorflow (cpu version), this does not happen. I have tried increasing the RAM to 13GB on the VM instance (larger than my local machine).
model = unet()
model_checkpoint = ModelCheckpoint('unet_membrane.hdf5', monitor='loss',verbose=1, save_best_only=True)
model.fit_generator(myGene,steps_per_epoch=300,epochs=1,callbacks=[model_checkpoint])
I expect the model to train, but instead I get a MemoryError with the following Traceback
Epoch 1/1
Found 30 images belonging to 1 classes.
Found 30 images belonging to 1 classes.
Traceback (most recent call last):
File "main.py", line 18, in <module>
model.fit_generator(myGene,steps_per_epoch=300,epochs=1,callbacks=[model_checkpoint])
File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training_generator.py", line 181, in fit_generator
generator_output = next(output_generator)
File "/usr/local/lib/python2.7/dist-packages/keras/utils/data_utils.py", line 709, in get
six.reraise(*sys.exc_info())
File "/usr/local/lib/python2.7/dist-packages/keras/utils/data_utils.py", line 685, in get
inputs = self.queue.get(block=True).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
MemoryError

it seems like your machine has run out of memory and can't train while storing all the arrays at the same time. Try to optimize your code to maybe save arrays of data, then loading them when needed so you don't have to store them in your RAM.

elasticsearch exception SerializationError

From python script i am sending data to elasticsearch server
this will help me to connect to ES
es = Elasticsearch('localhost:9200',use_ssl=False,verify_certs=True)
and by using the bellow code i am able to send all data to my local ES server
es.index(index='alertnagios', doc_type='nagios', body=jsonvalue)
But when i am trying to send data to cloud ES server,the script is executing fine and it is indexing few documents after indexing few documents i am getting following error
Traceback (most recent call last):
File "scriptfile.py", line 78, in <module>
es.index(index='test', doc_type='test123', body=jsonvalue)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 298, in index
_make_path(index, doc_type, id), params=params, body=body)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 342, in perform_request
data = self.deserializer.loads(data, headers.get('content-type'))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/serializer.py", line 76, in loads
return deserializer.loads(s)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/serializer.py", line 40, in loads
raise SerializationError(s, e)
elasticsearch.exceptions.SerializationError: (u'order=0></iframe>', JSONDecodeError('No JSON object could be decoded: line 1 column 0 (char 0)',))
The same script is working fine when i am sending data to my localhost ES server , I don't know why it is not working when i am sending data to cloud instance
Please help me

The problem is resolved by using Bulk indexing method ,when we are indexing to local server it won't be a matter if we index documents one after the other ,but while indexing to cloud instance we have to follow bulk indexing method to overcome from memory issues and connection issues
if we follow bulk indexing method it will index all the documnets once into a elasticsearch and no need to open connection again and again,it won't take much time.
here is my code
from elasticsearch import Elasticsearch, helpers
jsonobject= {
'_index': 'index',
'_type': 'index123',
'_source':jsonData
}
actions = [jsonobject]
helpers.bulk(es, actions, chunk_size=1000, request_timeout=200)

NotSupportedError when trying to build primary index in N1QL in Couchbase Python SDK

I'm trying to get into the new N1QL Queries for Couchbase in Python.
I got my database set up in Couchbase 4.0.0.
My initial try was to retreive all documents like this:
from couchbase.bucket import Bucket
bucket = Bucket('couchbase://localhost/dafault')
rv = bucket.n1ql_query('CREATE PRIMARY INDEX ON default').execute()
for row in bucket.n1ql_query('SELECT * FROM default'):
print row
But this produces a OperationNotSupportedError:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 2357, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1777, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/my_user/python_tests/test_n1ql.py", line 9, in <module>
rv = bucket.n1ql_query('CREATE PRIMARY INDEX ON default').execute()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 215, in execute
for _ in self:
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 235, in __iter__
self._start()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 180, in _start
self._mres = self._parent._n1ql_query(self._params.encoded)
couchbase.exceptions.NotSupportedError: <RC=0x13[Operation not supported], Couldn't schedule n1ql query, C Source=(src/n1ql.c,82)>
Here the version numbers of everything I use:
Couchbase Server: 4.0.0
couchbase python library: 2.0.2
cbc: 2.5.1
python: 2.7.8
gcc: 4.2.1
Anyone an idea what might have went wrong here? I could not find any solution to this problem up to now.
There was another ticket for node.js where the same issue happened. There was a proposal to enable n1ql for the specific bucket first. Is this also needed in python?

It would seem you didn't configure any cluster nodes with the Query or Index services. As such, the error returned is one that indicates no nodes are available.

I also got similar error while trying to create primary index.
Create a primary index...
Traceback (most recent call last):
File "post-upgrade-test.py", line 45, in <module>
mgr.n1ql_index_create_primary(ignore_exists=True)
File "/usr/local/lib/python2.7/dist-packages/couchbase/bucketmanager.py", line 428, in n1ql_index_create_primary
'', defer=defer, primary=True, ignore_exists=ignore_exists)
File "/usr/local/lib/python2.7/dist-packages/couchbase/bucketmanager.py", line 412, in n1ql_index_create
return IxmgmtRequest(self._cb, 'create', info, **options).execute()
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 160, in execute
return [x for x in self]
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 144, in __iter__
self._start()
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 132, in _start
self._cmd, index_to_rawjson(self._index), **self._options)
couchbase.exceptions.NotSupportedError: <RC=0x13[Operation not supported], Couldn't schedule ixmgmt operation, C Source=(src/ixmgmt.c,98)>
Adding query and index node to the cluster solved the issue.

OperationFailure: database error when threading in MongoEngine/PyMongo

I have a function that will read data from a website, process it, and then load it into MongoDB. When I run this without threading it works fine but as soon as I set up celery tasks that just call this one function I frequently get the following error: "OperationFailure: database error: unauthorized db:dbname lock type:-1"
It's somewhat odd because if I run the non-celery version on multiple terminals, I do not get this error at all.
I suspect it has something to do with there not being an open connection to Mongo although in my code I'm opening one up right before every Mongo call.
The exact exception is below:
Task twitter[a974bfcc-d6ca-4baf-b36f-cae9143ce2d9] raised exception: OperationFailure(u'database error: unauthorized db:data lock type:-1 client:68.193.49.9',)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/djangoblog/network/tasks.py", line 40, in twitter
n_twitter.GetTweetsTwitter(user)
File "/djangoblog/network/twitter.py", line 255, in GetTweetsTwitter
id = SaveTweet(user, network, tweet)
File "/djangoblog/network/twitter.py", line 150, in SaveTweet
if mmo.Moment.objects(user=user.id,source_id=id,network=network.id).count() == 0:
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 933, in count
return self._cursor.count(with_limit_and_skip=True)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 563, in _cursor
self._cursor_obj = self._collection.find(self._query,
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 493, in _collection
if self._collection_obj.name not in db.collection_names():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/database.py", line 361, in collection_names
names = [r["name"] for r in results]
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 703, in next
if len(self.__data) or self._refresh():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 666, in _refresh
self.__uuid_subtype))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 628, in __send_message self.__tz_aware)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/helpers.py", line 101, in _unpack_response error_object["$err"])
OperationFailure: database error: unauthorized db:data lock type:-1 client:68.193.49.9
Sorry for the formatting but if you look at the line that starts with mmo.Moment there's a connection being opened right before that's called.
Doing a bit of research it looks as if it has something to do with the way threading is handled in PyMongo - http://api.mongodb.org/python/1.5.1/faq.html#how-does-connection-pooling-work-in-pymongo - I may need to start closing the connections but I'd expect MongoEngine to be doing this..

This is likely due to the fact that you are not calling db.authenticate() when you start the new connection and are using auth on MongoDB.
Regarding the closing of threads, I would recommend making sure you are using connection pooling and letting the driver manage the pools (calling close() or similar manually can lead to a lot of pain).
For more info see the note in the pymongo documentation about using authenticate() in a multi-threaded environment.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Elastic search memoryerror on bulk insert - python-2.7

Related

Failed to query Elasticsearch using : TransportErrorr(400, 'parsing_exception')

How to fix 'MemoryError' while fitting keras model with fit_generator?

elasticsearch exception SerializationError

NotSupportedError when trying to build primary index in N1QL in Couchbase Python SDK

OperationFailure: database error when threading in MongoEngine/PyMongo

Categories

Resources