From python script i am sending data to elasticsearch server
this will help me to connect to ES
es = Elasticsearch('localhost:9200',use_ssl=False,verify_certs=True)
and by using the bellow code i am able to send all data to my local ES server
es.index(index='alertnagios', doc_type='nagios', body=jsonvalue)
But when i am trying to send data to cloud ES server,the script is executing fine and it is indexing few documents after indexing few documents i am getting following error
Traceback (most recent call last):
File "scriptfile.py", line 78, in <module>
es.index(index='test', doc_type='test123', body=jsonvalue)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 298, in index
_make_path(index, doc_type, id), params=params, body=body)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 342, in perform_request
data = self.deserializer.loads(data, headers.get('content-type'))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/serializer.py", line 76, in loads
return deserializer.loads(s)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/serializer.py", line 40, in loads
raise SerializationError(s, e)
elasticsearch.exceptions.SerializationError: (u'order=0></iframe>', JSONDecodeError('No JSON object could be decoded: line 1 column 0 (char 0)',))
The same script is working fine when i am sending data to my localhost ES server , I don't know why it is not working when i am sending data to cloud instance
Please help me
The problem is resolved by using Bulk indexing method ,when we are indexing to local server it won't be a matter if we index documents one after the other ,but while indexing to cloud instance we have to follow bulk indexing method to overcome from memory issues and connection issues
if we follow bulk indexing method it will index all the documnets once into a elasticsearch and no need to open connection again and again,it won't take much time.
here is my code
from elasticsearch import Elasticsearch, helpers
jsonobject= {
'_index': 'index',
'_type': 'index123',
'_source':jsonData
}
actions = [jsonobject]
helpers.bulk(es, actions, chunk_size=1000, request_timeout=200)
Related
This seems like an old solved problem here and here and here but Still I am getting this error.I create my db on Docker.And It worked only one time.Before this, I re-created db, did "connect =false",added wait, downgraded pymongo, did previous solutions etc. I stuck.
Python 3.8.0, Pymongo 3.9.0
from pymongo import MongoClient
import pprint
client = MongoClient('mongodb://192.168.1.100:27017/',
username='admin',
password='psw',
authSource='myappdb',
authMechanism='SCRAM-SHA-1',
connect=False)
db = client['myappdb']
serverStatusResult=db.command("serverStatus")
pprint(serverStatusResult)
and I am getting ServerSelectionTimeoutError
Traceback (most recent call last):
File "C:\Users\ME\eclipse2019-workspace\exdjango\exdjango__init__.py",
line 12, in
serverStatusResult=db.command("serverStatus")
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\database.py",
line 610, in command
with self.client._socket_for_reads(
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\contextlib.py",
line 113, in __enter
return next(self.gen)
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\mongo_client.py",
line 1099, in _socket_for_reads
server = topology.select_server(read_preference)
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\topology.py",
line 222, in select_server
return random.choice(self.select_servers(selector,
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\topology.py",
line 182, in select_servers
server_descriptions = self._select_servers_loop(
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\topology.py",
line 198, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: 192.168.1.100:27017: timed out
Your connection looks a little misconfigured. Firstly you have half connection string, half parameter format. I'd suggest you stick with one or the other.
Your auth database is usually seperate to your actual databases (and it's usually called admin). Check this is correct.
There's no particular need to specify the authMechanism assuming you are using MongoDB 3.0 or later.
The connect=False is likely a red herring.
So I would try one of either:
client = MongoClient('mongodb://admin:psw#192.168.1.100:27017/myappdb?authSource=admin')
or
client = MongoClient(host='192.168.1.100',
port=27017,
username='admin',
password='psw',
authSource='admin')
I am a django newbie, and I inherited a django back-end with little documentation. I am making a request to said server, which is hosted on AWS. To store the files in the request we use S3.
I have found nothing on the django code that limits the size of the file uploads, and I suspect it may be AWS closing the connection because of file size.
This is the code I use, and below the error I get whenever the total size of the files is over 1 MB:
import requests
json_dict = {'key_1':'value_1','video':video,'image':,image}
requests.post('https://api.test.whatever.io/v1/register', json=dict_reg)
video is a video file ('.mov','.avi','.mp4',etc) with base64 encoding, and image is an image file ('.jpg','.png') with base64 encoding.
And this is the trace I get, ONLY when the total size is over 1 MB:
/usr/local/lib/python2.7/dist-
packages/requests/packages/urllib3/util/ssl_.py:132: InsecurePlatfo
rmWarning: A true SSLContext object is not available. This prevents urllib3
from configuring SSL
appropriately and may cause certain SSL connections to fail. You can upgrade
to a newer version of Python to solve this. For more information, see
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecurePlatformWarningTraceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 110, in
post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in
request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line
488, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line
609, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line
473, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', error(32,
'Broken pipe'))
As mentioned previously, I have not found anywhere in the django code a limit to the file size, any hints where I should be looking at?
I also did not find anything on the AWS S3 policy.
Assuming you have a Nginx to reverse proxy your HTTP requests? if yes check this link.
Also see the value set for the below value in settings for the Upload Handlers in django
FILE_UPLOAD_MAX_MEMORY_SIZE
In the end it was the nginx configuration. changing the variable client_max_body_size in the nginx.conffile from 1M to 2M did the trick.
I'm trying to get into the new N1QL Queries for Couchbase in Python.
I got my database set up in Couchbase 4.0.0.
My initial try was to retreive all documents like this:
from couchbase.bucket import Bucket
bucket = Bucket('couchbase://localhost/dafault')
rv = bucket.n1ql_query('CREATE PRIMARY INDEX ON default').execute()
for row in bucket.n1ql_query('SELECT * FROM default'):
print row
But this produces a OperationNotSupportedError:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 2357, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1777, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/my_user/python_tests/test_n1ql.py", line 9, in <module>
rv = bucket.n1ql_query('CREATE PRIMARY INDEX ON default').execute()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 215, in execute
for _ in self:
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 235, in __iter__
self._start()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 180, in _start
self._mres = self._parent._n1ql_query(self._params.encoded)
couchbase.exceptions.NotSupportedError: <RC=0x13[Operation not supported], Couldn't schedule n1ql query, C Source=(src/n1ql.c,82)>
Here the version numbers of everything I use:
Couchbase Server: 4.0.0
couchbase python library: 2.0.2
cbc: 2.5.1
python: 2.7.8
gcc: 4.2.1
Anyone an idea what might have went wrong here? I could not find any solution to this problem up to now.
There was another ticket for node.js where the same issue happened. There was a proposal to enable n1ql for the specific bucket first. Is this also needed in python?
It would seem you didn't configure any cluster nodes with the Query or Index services. As such, the error returned is one that indicates no nodes are available.
I also got similar error while trying to create primary index.
Create a primary index...
Traceback (most recent call last):
File "post-upgrade-test.py", line 45, in <module>
mgr.n1ql_index_create_primary(ignore_exists=True)
File "/usr/local/lib/python2.7/dist-packages/couchbase/bucketmanager.py", line 428, in n1ql_index_create_primary
'', defer=defer, primary=True, ignore_exists=ignore_exists)
File "/usr/local/lib/python2.7/dist-packages/couchbase/bucketmanager.py", line 412, in n1ql_index_create
return IxmgmtRequest(self._cb, 'create', info, **options).execute()
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 160, in execute
return [x for x in self]
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 144, in __iter__
self._start()
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 132, in _start
self._cmd, index_to_rawjson(self._index), **self._options)
couchbase.exceptions.NotSupportedError: <RC=0x13[Operation not supported], Couldn't schedule ixmgmt operation, C Source=(src/ixmgmt.c,98)>
Adding query and index node to the cluster solved the issue.
Im inserting 5000 records at once into elastic search
Total Size of these records is: 33936 (I got this using sys.getsizeof())
Elastic Search version: 1.5.0
Python 2.7
Ubuntu
Here is the following error
Traceback (most recent call last):
File "run_indexing.py", line 67, in <module>
index_policy_content(datatable, source, policyids)
File "run_indexing.py", line 60, in index_policy_content
bulk(elasticsearch_instance, actions)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 148, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 107, in streaming_bulk
resp = client.bulk(bulk_actions, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 568, in bulk
params=params, body=self._bulk_body(body))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 259, in perform_request
body = body.encode('utf-8')
MemoryError
Please help me resolve the issue.
Thanks & Regards,
Afroze
If I had to guess, I'd say this memory error is happening within python as it loads and serializes its data. Try cutting way back on the batch sizes until you get something that works, and then binary search upward until it fails again. That should help you figure out a safe batch size to use.
(Other useful information you might want to include: amount of memory in the server you're running your python process on, amount of memory for your elasticsearch server node(s), amount of heap allocated to Java.)
I have a function that will read data from a website, process it, and then load it into MongoDB. When I run this without threading it works fine but as soon as I set up celery tasks that just call this one function I frequently get the following error: "OperationFailure: database error: unauthorized db:dbname lock type:-1"
It's somewhat odd because if I run the non-celery version on multiple terminals, I do not get this error at all.
I suspect it has something to do with there not being an open connection to Mongo although in my code I'm opening one up right before every Mongo call.
The exact exception is below:
Task twitter[a974bfcc-d6ca-4baf-b36f-cae9143ce2d9] raised exception: OperationFailure(u'database error: unauthorized db:data lock type:-1 client:68.193.49.9',)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/djangoblog/network/tasks.py", line 40, in twitter
n_twitter.GetTweetsTwitter(user)
File "/djangoblog/network/twitter.py", line 255, in GetTweetsTwitter
id = SaveTweet(user, network, tweet)
File "/djangoblog/network/twitter.py", line 150, in SaveTweet
if mmo.Moment.objects(user=user.id,source_id=id,network=network.id).count() == 0:
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 933, in count
return self._cursor.count(with_limit_and_skip=True)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 563, in _cursor
self._cursor_obj = self._collection.find(self._query,
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 493, in _collection
if self._collection_obj.name not in db.collection_names():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/database.py", line 361, in collection_names
names = [r["name"] for r in results]
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 703, in next
if len(self.__data) or self._refresh():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 666, in _refresh
self.__uuid_subtype))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 628, in __send_message self.__tz_aware)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/helpers.py", line 101, in _unpack_response error_object["$err"])
OperationFailure: database error: unauthorized db:data lock type:-1 client:68.193.49.9
Sorry for the formatting but if you look at the line that starts with mmo.Moment there's a connection being opened right before that's called.
Doing a bit of research it looks as if it has something to do with the way threading is handled in PyMongo - http://api.mongodb.org/python/1.5.1/faq.html#how-does-connection-pooling-work-in-pymongo - I may need to start closing the connections but I'd expect MongoEngine to be doing this..
This is likely due to the fact that you are not calling db.authenticate() when you start the new connection and are using auth on MongoDB.
Regarding the closing of threads, I would recommend making sure you are using connection pooling and letting the driver manage the pools (calling close() or similar manually can lead to a lot of pain).
For more info see the note in the pymongo documentation about using authenticate() in a multi-threaded environment.