OperationFailure: database error when threading in MongoEngine/PyMongo - django

I have a function that will read data from a website, process it, and then load it into MongoDB. When I run this without threading it works fine but as soon as I set up celery tasks that just call this one function I frequently get the following error: "OperationFailure: database error: unauthorized db:dbname lock type:-1"
It's somewhat odd because if I run the non-celery version on multiple terminals, I do not get this error at all.
I suspect it has something to do with there not being an open connection to Mongo although in my code I'm opening one up right before every Mongo call.
The exact exception is below:
Task twitter[a974bfcc-d6ca-4baf-b36f-cae9143ce2d9] raised exception: OperationFailure(u'database error: unauthorized db:data lock type:-1 client:68.193.49.9',)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/djangoblog/network/tasks.py", line 40, in twitter
n_twitter.GetTweetsTwitter(user)
File "/djangoblog/network/twitter.py", line 255, in GetTweetsTwitter
id = SaveTweet(user, network, tweet)
File "/djangoblog/network/twitter.py", line 150, in SaveTweet
if mmo.Moment.objects(user=user.id,source_id=id,network=network.id).count() == 0:
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 933, in count
return self._cursor.count(with_limit_and_skip=True)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 563, in _cursor
self._cursor_obj = self._collection.find(self._query,
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 493, in _collection
if self._collection_obj.name not in db.collection_names():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/database.py", line 361, in collection_names
names = [r["name"] for r in results]
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 703, in next
if len(self.__data) or self._refresh():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 666, in _refresh
self.__uuid_subtype))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 628, in __send_message self.__tz_aware)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/helpers.py", line 101, in _unpack_response error_object["$err"])
OperationFailure: database error: unauthorized db:data lock type:-1 client:68.193.49.9
Sorry for the formatting but if you look at the line that starts with mmo.Moment there's a connection being opened right before that's called.
Doing a bit of research it looks as if it has something to do with the way threading is handled in PyMongo - http://api.mongodb.org/python/1.5.1/faq.html#how-does-connection-pooling-work-in-pymongo - I may need to start closing the connections but I'd expect MongoEngine to be doing this..

This is likely due to the fact that you are not calling db.authenticate() when you start the new connection and are using auth on MongoDB.
Regarding the closing of threads, I would recommend making sure you are using connection pooling and letting the driver manage the pools (calling close() or similar manually can lead to a lot of pain).
For more info see the note in the pymongo documentation about using authenticate() in a multi-threaded environment.

Related

PyMongo 3 and ServerSelectionTimeoutError while getting data from Mongodb

This seems like an old solved problem here and here and here but Still I am getting this error.I create my db on Docker.And It worked only one time.Before this, I re-created db, did "connect =false",added wait, downgraded pymongo, did previous solutions etc. I stuck.
Python 3.8.0, Pymongo 3.9.0
from pymongo import MongoClient
import pprint
client = MongoClient('mongodb://192.168.1.100:27017/',
username='admin',
password='psw',
authSource='myappdb',
authMechanism='SCRAM-SHA-1',
connect=False)
db = client['myappdb']
serverStatusResult=db.command("serverStatus")
pprint(serverStatusResult)
and I am getting ServerSelectionTimeoutError
Traceback (most recent call last):
File "C:\Users\ME\eclipse2019-workspace\exdjango\exdjango__init__.py",
line 12, in
serverStatusResult=db.command("serverStatus")
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\database.py",
line 610, in command
with self.client._socket_for_reads(
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\contextlib.py",
line 113, in __enter
return next(self.gen)
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\mongo_client.py",
line 1099, in _socket_for_reads
server = topology.select_server(read_preference)
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\topology.py",
line 222, in select_server
return random.choice(self.select_servers(selector,
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\topology.py",
line 182, in select_servers
server_descriptions = self._select_servers_loop(
File "C:\Users\ME\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymongo\topology.py",
line 198, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: 192.168.1.100:27017: timed out
Your connection looks a little misconfigured. Firstly you have half connection string, half parameter format. I'd suggest you stick with one or the other.
Your auth database is usually seperate to your actual databases (and it's usually called admin). Check this is correct.
There's no particular need to specify the authMechanism assuming you are using MongoDB 3.0 or later.
The connect=False is likely a red herring.
So I would try one of either:
client = MongoClient('mongodb://admin:psw#192.168.1.100:27017/myappdb?authSource=admin')
or
client = MongoClient(host='192.168.1.100',
port=27017,
username='admin',
password='psw',
authSource='admin')

SSL SYSCALL error: Bad file descriptor on Heroku with postgres and Celery

I've been using Celery successfully with a Django site on Heroku but it's just started generating the error below, which stops it running. It looks like it's having trouble with postgres, but I'm stumped as to how to fix it, given it's Celery rather than my code that's having the problem (I assume...).
I'm using CloudAMPQ as a broker, and my Django settings include:
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
Here's the traceback from the Heroku logs:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.5/site-packages/kombu/utils/__init__.py", line 323, in __get__
return obj.__dict__[self.__name__]
KeyError: 'scheduler'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
psycopg2.OperationalError: SSL SYSCALL error: Bad file descriptor
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.5/site-packages/billiard/process.py", line 292, in _bootstrap
self.run()
File "/app/.heroku/python/lib/python3.5/site-packages/celery/beat.py", line 553, in run
self.service.start(embedded_process=True)
File "/app/.heroku/python/lib/python3.5/site-packages/celery/beat.py", line 470, in start
humanize_seconds(self.scheduler.max_interval))
File "/app/.heroku/python/lib/python3.5/site-packages/kombu/utils/__init__.py", line 325, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "/app/.heroku/python/lib/python3.5/site-packages/celery/beat.py", line 512, in scheduler
return self.get_scheduler()
File "/app/.heroku/python/lib/python3.5/site-packages/celery/beat.py", line 507, in get_scheduler
lazy=lazy)
File "/app/.heroku/python/lib/python3.5/site-packages/celery/utils/imports.py", line 53, in instantiate
return symbol_by_name(name)(*args, **kwargs)
File "/app/.heroku/python/lib/python3.5/site-packages/djcelery/schedulers.py", line 151, in __init__
Scheduler.__init__(self, *args, **kwargs)
File "/app/.heroku/python/lib/python3.5/site-packages/celery/beat.py", line 185, in __init__
self.setup_schedule()
File "/app/.heroku/python/lib/python3.5/site-packages/djcelery/schedulers.py", line 158, in setup_schedule
self.install_default_entries(self.schedule)
File "/app/.heroku/python/lib/python3.5/site-packages/djcelery/schedulers.py", line 251, in schedule
self._schedule = self.all_as_schedule()
File "/app/.heroku/python/lib/python3.5/site-packages/djcelery/schedulers.py", line 164, in all_as_schedule
for model in self.Model.objects.enabled():
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/query.py", line 258, in __iter__
self._fetch_all()
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/query.py", line 1074, in _fetch_all
self._result_cache = list(self.iterator())
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/query.py", line 52, in __iter__
results = compiler.execute_sql()
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/sql/compiler.py", line 848, in execute_sql
cursor.execute(sql, params)
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/utils.py", line 95, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/app/.heroku/python/lib/python3.5/site-packages/django/utils/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
django.db.utils.OperationalError: SSL SYSCALL error: Bad file descriptor
I've resolved the issue now... there was a line of my Django code which had caused an Internal Server Error in the past -- I think, early on in Django starting up, it was trying to access a model before the migrations that created the model had run.
I'd resolved that but noticed these "SSL SYSCALL error"s started about the same time. So I removed that line of code, and Celery has started up again.
It could be coincidence. And I don't understand why this fixed things.
Ideally I'd still like to understand what the error above actually means so I'd have a better chance of fixing such a thing in the future.
I had the same error and i solved.
In my case the problem was a custom AppConfig in one of my django apps.
my folder
my_django_app/
__init__.py
apps.py
...
I have a file my_django_app/__init__.py like this
default_app_config="my_django_app.apps.MyAppConfig"
and the file my_django_app/apps.py like this
class ChallengeConfig(AppConfig):
name = 'appConfig'
verbose_name = "djangoAppConfig"
def ready(self):
... # custom logic
After removeing the line default_app_config="my_django_app.apps.MyAppConfig" celery works again
If you have something like this, remove your custom AppConfig and use a celery periodic task instead of a custom AppConfig. django_celery_beat do something strange with AppConfig so if you have a custom AppConfig it generate the problem
My python requirements:
Django==1.11.29
celery==4.4.7
django_celery_beat==1.6.0
I know this was asked years ago, but this was about the only question out there which related to my problem.
In my case, it was down to the CONN_MAX_AGE being set at 600. Reduced to 0 so that there are unlimited persistent connections.
From the docs:
Persistent connections avoid the overhead of re-establishing a connection to the database in each request. They’re controlled by the CONN_MAX_AGE parameter which defines the maximum lifetime of a connection. It can be set independently for each database.
The default value is 0, preserving the historical behavior of closing the database connection at the end of each request. To enable persistent connections, set CONN_MAX_AGE to a positive integer of seconds. For unlimited persistent connections, set it to None.
Django opens a connection to the database when it first makes a database query. It keeps this connection open and reuses it in subsequent requests. Django closes the connection once it exceeds the maximum age defined by CONN_MAX_AGE or when it isn’t usable any longer.
In detail, Django automatically opens a connection to the database whenever it needs one and doesn’t have one already — either because this is the first connection, or because the previous connection was closed.
At the beginning of each request, Django closes the connection if it has reached its maximum age. If your database terminates idle connections after some time, you should set CONN_MAX_AGE to a lower value, so that Django doesn’t attempt to use a connection that has been terminated by the database server. (This problem may only affect very low traffic sites.)
At the end of each request, Django closes the connection if it has reached its maximum age or if it is in an unrecoverable error state. If any database errors have occurred while processing the requests, Django checks whether the connection still works, and closes it if it doesn’t. Thus, database errors affect at most one request; if the connection becomes unusable, the next request gets a fresh connection.

IOError: [Errno 9] Bad file descriptor using robot framework with Python

I am trying to run the following Python 2.7.10 code on Win7:
from robot import run
run("Login.robot", variable=['USERNAME:testuser#fakemail.com'])
Login.robot is a very simple robot framework test that opens a browser window, loads our log in page, enters in a username and password, then confirms the user has logged in. I have tested it by running it from the command line with pybot and the test suite runs correctly. But trying to run the test suite from this script generates
Traceback (most recent call last):
File "C:\Users\dmdunn\Desktop\test_user_data.py", line 4, in <module>
run("Login.robot", variable=['USERNAME:testuser#fakemail.com'])
File "C:\Python27\lib\site-packages\robot\run.py", line 471, in run
return RobotFramework().execute(*datasources, **options)
File "C:\Python27\lib\site-packages\robot\utils\application.py", line 83, in execute
return self._execute(list(arguments), options)
File "C:\Python27\lib\site-packages\robot\utils\application.py", line 96, in _execute
details, rc=FRAMEWORK_ERROR)
File "C:\Python27\lib\site-packages\robot\utils\application.py", line 110, in _report_error
self._logger.error(message)
File "C:\Python27\lib\site-packages\robot\output\loggerhelper.py", line 59, in error
self.write(msg, 'ERROR')
File "C:\Python27\lib\site-packages\robot\output\loggerhelper.py", line 62, in write
self.message(Message(message, level, html))
File "C:\Python27\lib\site-packages\robot\output\logger.py", line 109, in message
logger.message(msg)
File "C:\Python27\lib\site-packages\robot\output\monitor.py", line 66, in message
self._writer.error(msg.message, msg.level, clear=self._running_test)
File "C:\Python27\lib\site-packages\robot\output\monitor.py", line 142, in error
self._highlight('[ ', level, ' ] ' + message, error=True)
File "C:\Python27\lib\site-packages\robot\output\monitor.py", line 158, in _highlight
self._write(before, newline=False, error=error)
File "C:\Python27\lib\site-packages\robot\output\monitor.py", line 154, in _write
stream.flush()
IOError: [Errno 9] Bad file descriptor
Any suggestions? I've tried using an absolute path to the robot file, single quotes, double quotes, referencing via an open file object, referencing via a closed file object. Thanks!
I discovered it was due to using IDLE, which prevented robot framework from writing to the console. My script runs when started from the Windows command line.
Open a file handle and call robot.run method inside while handle is open.
I hope this will solve your issue.
mydir = os.path.join('c:/','....')
os.makedirs(mydir)
with open('mydir', 'w') as stdout_file:
k = run("Login.robot",
variable=['USERNAME:testuser#fakemail.com'],
stdout=stdout_file)

Elastic search memoryerror on bulk insert

Im inserting 5000 records at once into elastic search
Total Size of these records is: 33936 (I got this using sys.getsizeof())
Elastic Search version: 1.5.0
Python 2.7
Ubuntu
Here is the following error
Traceback (most recent call last):
File "run_indexing.py", line 67, in <module>
index_policy_content(datatable, source, policyids)
File "run_indexing.py", line 60, in index_policy_content
bulk(elasticsearch_instance, actions)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 148, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 107, in streaming_bulk
resp = client.bulk(bulk_actions, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 568, in bulk
params=params, body=self._bulk_body(body))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 259, in perform_request
body = body.encode('utf-8')
MemoryError
Please help me resolve the issue.
Thanks & Regards,
Afroze
If I had to guess, I'd say this memory error is happening within python as it loads and serializes its data. Try cutting way back on the batch sizes until you get something that works, and then binary search upward until it fails again. That should help you figure out a safe batch size to use.
(Other useful information you might want to include: amount of memory in the server you're running your python process on, amount of memory for your elasticsearch server node(s), amount of heap allocated to Java.)

Django-nonrel + MongoDB error while running tests

I have a setup inside a virtual environment:
Django-nonrel-1.6
mongodb-engine
djangotoolbox
Everything works fine, the only problem is during running tests. Every time django tries to flush database after running a test function it throws an error:
Traceback (most recent call last):
File "/home/lehins/.virtualenvs/studentpal/local/lib/python2.7/site-packages/django/test/testcases.py", line 187, in __call__
self._post_teardown()
File "/home/lehins/.virtualenvs/studentpal/local/lib/python2.7/site-packages/django/test/testcases.py", line 796, in _post_teardown
self._fixture_teardown()
File "/home/lehins/.virtualenvs/studentpal/local/lib/python2.7/site-packages/django/test/testcases.py", line 889, in _fixture_teardown
return super(TestCase, self)._fixture_teardown()
File "/home/lehins/.virtualenvs/studentpal/local/lib/python2.7/site-packages/django/test/testcases.py", line 817, in _fixture_teardown
inhibit_post_syncdb=self.available_apps is not None)
File "/home/lehins/.virtualenvs/studentpal/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 159, in call_command
return klass.execute(*args, **defaults)
File "/home/lehins/.virtualenvs/studentpal/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute
output = self.handle(*args, **options)
File "/home/lehins/.virtualenvs/studentpal/local/lib/python2.7/site-packages/django/core/management/base.py", line 415, in handle
return self.handle_noargs(**options)
File "/home/lehins/.virtualenvs/studentpal/local/lib/python2.7/site-packages/django/core/management/commands/flush.py", line 79, in handle_noargs
six.reraise(CommandError, CommandError(new_msg), sys.exc_info()[2])
File "/home/lehins/.virtualenvs/studentpal/local/lib/python2.7/site-packages/django/core/management/commands/flush.py", line 67, in handle_noargs
savepoint=connection.features.can_rollback_ddl):
File "/home/lehins/.virtualenvs/studentpal/local/lib/python2.7/site-packages/django/db/transaction.py", line 251, in __enter__
"The outermost 'atomic' block cannot use "
CommandError: Database test_dev_db couldn't be flushed. Possible reasons:
* The database isn't running or isn't configured correctly.
* At least one of the expected database tables doesn't exist.
* The SQL was invalid.
Hint: Look at the output of 'django-admin.py sqlflush'. That's the SQL this command wasn't able to run.
The full error: The outermost 'atomic' block cannot use savepoint = False when autocommit is off.
So for each test case, I have a pass or a fail, like it suppose to, but I also get this annoying error.
I did run django-admin.py sqlflush --settings=dev_settings --pythonpath=., which flushed my development database just fine, with no errors.
In a couple test functions I checked a few models pulled from database, and it seems to be flushing and recreating objects just fine, so that's why it is not affecting actual test cases.
I went though the whole traceback and I kind of understand why it happens, but I cannot figure out how to deal with. Any help is appreciated.
Edit
Just tried running tests with Django-nonrel-1.5, there was no problems. It seems like a bug in 1.6 version.
Use SimpleTestCase or a custom TestCase
class CustomTestCase(TestCase):
def _fixture_teardown(self):
for db_name in self._databases_names(include_mirrors=False):
call_command('custom_flush', verbosity=0, interactive=False,
database=db_name, skip_validation=True,
reset_sequences=False,
allow_cascade=self.available_apps is not None,
inhibit_post_syncdb=self.available_apps is not None)
Since the problem is transaction.atomic in command flush, you may have to write your own flush.