Restarting a Monitor instance - python-2.7

I have a cherrypy app that's got a Monitor instance like so:
mail_checker = Monitor(cherrypy.engine, self.mail_processor.poll_history_feed, frequency=10)
To put it simply it checks a gmail inbox for new emails and processes them. Sometimes poll_history_feed() will throw an exception, I'm guessing right now that its because of our unstable internet, and it ceases to run until I restart the whole app. (sample of the traceback below)
[01/Mar/2016:17:08:29] ENGINE Error in background task thread function <bound method MailProcessor.poll_history_feed of <mailservices.mailprocessor.MailProcessor object at 0x10a2f0250>>.
Traceback (most recent call last):
File "/Users/hashtaginteractive/Projects/.venvs/emaild/lib/python2.7/site-packages/cherrypy/process/plugins.py", line 500, in run
self.function(*self.args, **self.kwargs)
File "/Users/hashtaginteractive/Projects/emaild/emaild-source/mailservices/mailprocessor.py", line 12, in poll_history_feed
labelIds=["INBOX", "UNREAD"]
File "/Users/hashtaginteractive/Projects/.venvs/emaild/lib/python2.7/site-packages/oauth2client/util.py", line 142, in positional_wrapper
return wrapped(*args, **kwargs)
File "/Users/hashtaginteractive/Projects/.venvs/emaild/lib/python2.7/site-packages/googleapiclient/http.py", line 730, in execute
return self.postproc(resp, content)
File "/Users/hashtaginteractive/Projects/.venvs/emaild/lib/python2.7/site-packages/googleapiclient/model.py", line 207, in response
return self.deserialize(content)
File "/Users/hashtaginteractive/Projects/.venvs/emaild/lib/python2.7/site-packages/googleapiclient/model.py", line 262, in deserialize
content = content.decode('utf-8')
File "/Users/hashtaginteractive/Projects/.venvs/emaild/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 23: invalid start byte
Is there any way to set this up so that it automatically restarts either the server or this particular Monitor instance whenever an exception happens?

You have to wrap the call to self.mail_processor.poll_history_feed in a try/except block and log the error for convenience.
def safe_poll_history_feed(self):
try:
self.mail_processor.poll_history_feed()
except Exception:
cherrypy.engine.log("Exception in mailprocessor monitor", traceback=True)
And then use the safe_poll_history_feed method

Related

Google Cloud Function pulling from Pub/Sub subscription throws exception - Deadline Exceeded

I have a Google Cloud Function in Python 3.7 reading from a Pub/Sub subscription in synchronous pull mode.
After running fine 1/hour for 24 hours, it threw this exception stack trace:
Traceback (most recent call last): File
"/env/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py",
line 57, in error_remapped_callable
return callable_(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/grpc/_channel.py", line 824,
in call
return _end_unary_response_blocking(state, call, False, None) File "/env/local/lib/python3.7/site-packages/grpc/_channel.py", line
726, in _end_unary_response_blocking
raise _InactiveRpcError(state) grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status =
StatusCode.DEADLINE_EXCEEDED details = "Deadline Exceeded"
debug_error_string =
"{"created":"#1580454091.145703535","description":"Error received from
peer
ipv4:74.125.202.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Deadline
Exceeded","grpc_status":4}"
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File
"/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py",
line 346, in run_http_function
result = _function_handler.invoke_user_function(flask.request) File
"/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py",
line 217, in invoke_user_function
return call_user_function(request_or_event) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py",
line 210, in call_user_function
return self._user_function(request_or_event) File "/user_code/main.py", line 39, in iteration
response = sub.pull(sub_path, MAX_MESSAGES) File "/env/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/_gapic.py",
line 40, in
fx = lambda self, *a, **kw: wrapped_fx(self.api, *a, **kw) # noqa File
"/env/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/gapic/subscriber_client.py",
line 1005, in pull
request, retry=retry, timeout=timeout, metadata=metadata File "/env/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py",
line 143, in call
return wrapped_func(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/google/api_core/retry.py",
line 286, in retry_wrapped_func
on_error=on_error, File "/env/local/lib/python3.7/site-packages/google/api_core/retry.py",
line 184, in retry_target
return target() File "/env/local/lib/python3.7/site-packages/google/api_core/timeout.py",
line 214, in func_with_timeout
return func(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py",
line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc) File "", line 3, in raise_from
google.api_core.exceptions.DeadlineExceeded: 504 Deadline Exceeded
What is this about? Is it to be expected or a result of some configuration problem? If to be expected, how should it be handled?
The documentation ( view-source:https://googleapis.dev/python/pubsub/latest/subscriber/api/client.html ) on pull has nothing about this being a possible exception.
I ack the messages immediately after the pull completes. I only permit one function execution at a time. I have a 600 second acknowledgement deadline. A block of messages pulled at one time seem to be less than 100 in number. If this is about failing to ack a message, it seems like the error could be done much better.
This exception is raised by the client when there's no messages to read in the subscription. It is a known issue from the latest PubSub library versions >= 1.0.0. If necessary, you can downgrade to the version 0.45.0 where this issue was not present.
However, as a workaround you can catch the DeadlineExceeded exception and retry the operation again. Also, based on the comment of Hemang, here's a small monkeypatch that you can add to your running code, which might help to get the same behavior as in version 0.45.0.
from google.cloud.pubsub_v1.gapic import subscriber_client_config as sub_config
sub_config.config['interfaces']['google.pubsub.v1.Subscriber']['retry_params']['messaging']['initial_rpc_timeout_millis'] = 25000
Finally, keep in mind that when using synchronous pull, having many outstanding pull requests helps lower the delivery latency, which in turn might result in higher latency pull requests (and DeadlineExceeded errors). Although, if latency is crucial for the application, you could consider using StreamingPull

Celery raise MemoryError

I'm using celery 4.3.0,And use it to generate a image which maybe 10M+,Then I got an error like below:
Pool callback raised exception: MemoryError('Process got: ')
Traceback (most recent call last):
File "*/lib/python3.7/site-packages/billiard/pool.py", line 1750, in safe_apply_callback
fun(*args, **kwargs)
File "*/lib/python3.7/site-packages/celery/worker/request.py", line 564, in on_success
return self.on_failure(retval, return_ok=True)
File "*/lib/python3.7/site-packages/celery/worker/request.py", line 351, in on_failure
raise MemoryError('Process got: %s' % (exc_info.exception,))
My server has 20G+ memory left when running this task.And I had test some small images which work well.Do I need set some config to prevent this?

TypeError when using botocore to read from AWS SQS queue

I'm using a Tornado server with tornado-botocore to connect to Amazon SQS services.
When running stress tests we sometimes get the following exception:
Traceback (most recent call last):
File "/home/app/handlers/WebSocketsHandler.py", line 95, in listen_outgoing_queue
message = yield tornado.gen.Task(self.outgoing_queue.read)
File "/home/local/lib/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/home/local/lib/python2.7/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/home/local/lib/python2.7/site-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/home/local/lib/python2.7/site-packages/tornado_botocore/base.py", line 70, in prepare_response
response_dict, operation_model.output_shape)
File "/home/local/lib/python2.7/site-packages/botocore/parsers.py", line 155, in parse
return self._do_error_parse(response, shape)
File "/home/.env/local/lib/python2.7/site-packages/botocore/parsers.py", line 314, in _do_error_parse
root = self._parse_xml_string_to_dom(xml_contents)
File "/home/local/lib/python2.7/site-packages/botocore/parsers.py", line 274, in _parse_xml_string_to_dom
parser.feed(xml_string)
TypeError: must be string or read-only buffer, not None
could it be caused by the concurrency?
has anyone encountered such behavior?
We are using tornado 4.2.1, botocore 0.65.0 and tonado-botocore 0.1.6
problem solved once i removed the #tornado.gen.engine decorator from the method

Celery task errors when getting facebook picture

I have the following celery task:
#task
def get_users_facebook_as_profile_icon(user_id, facebook_id):
logger.info('Grabbing users facebook picture')
url = "http://graph.facebook.com/%s/picture?type=large" % facebook_id
import requests
response = requests.get(url)
if response.status_code != 200:
raise Exception("Could not get facebook profile picture")
...
I have more after this, but I keep getting the following error:
"AssertionError('PID check failed. RNG must be re-initialized after fork(). Hint: Try Random.atfork()',)"
Task was called with args: (3246, 17500596) kwargs: {}.
The contents of the full traceback was:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 437, in __protected_call__
return self.run(*args, **kwargs)
File "/home/ubuntu/mounzawebsite/mounza/celery_tasks/login_registration.py", line 42, in get_users_facebook_as_profile_icon
hashname = user.generate_picture_name()
File "/home/ubuntu/mounzawebsite/mounza/web/models.py", line 515, in generate_picture_name
return generate_random_name(None)
File "/home/ubuntu/mounzawebsite/mounza/web/models.py", line 40, in generate_random_name
str(random.randint(1, 99982098098908237)) +
File "/usr/lib/python2.7/dist-packages/Crypto/Random/__init__.py", line 41, in get_random_bytes
return _UserFriendlyRNG.get_random_bytes(n)
File "/usr/lib/python2.7/dist-packages/Crypto/Random/_UserFriendlyRNG.py", line 213, in get_random_bytes
return _get_singleton().read(n)
File "/usr/lib/python2.7/dist-packages/Crypto/Random/_UserFriendlyRNG.py", line 163, in read
return _UserFriendlyRNG.read(self, bytes)
File "/usr/lib/python2.7/dist-packages/Crypto/Random/_UserFriendlyRNG.py", line 122, in read
self._check_pid()
File "/usr/lib/python2.7/dist-packages/Crypto/Random/_UserFriendlyRNG.py", line 138, in _check_pid
raise AssertionError("PID check failed. RNG must be re-initialized after fork(). Hint: Try Random.atfork()")
AssertionError: PID check failed. RNG must be re-initialized after fork(). Hint: Try Random.atfork()
I tried digging into this online, not able to find the root cause. but this is the only task where this error occurs. The only difference is that i'm downloading an image from Facebook, but I never see this issue anywhere else, including other tasks where I download images.
The URL works perfectly if I do it through a web browser, but it's only via this task it fails. Is there anything else that could contribute to this??
I have exhausted all attempts in fixing this :(
Here is why:
http://comments.gmane.org/gmane.comp.python.amqp.celery.user/3664
always run teh below:
Crypto.Random.atfork()
When a new worker process is initialized. Done and done.

OperationFailure: database error when threading in MongoEngine/PyMongo

I have a function that will read data from a website, process it, and then load it into MongoDB. When I run this without threading it works fine but as soon as I set up celery tasks that just call this one function I frequently get the following error: "OperationFailure: database error: unauthorized db:dbname lock type:-1"
It's somewhat odd because if I run the non-celery version on multiple terminals, I do not get this error at all.
I suspect it has something to do with there not being an open connection to Mongo although in my code I'm opening one up right before every Mongo call.
The exact exception is below:
Task twitter[a974bfcc-d6ca-4baf-b36f-cae9143ce2d9] raised exception: OperationFailure(u'database error: unauthorized db:data lock type:-1 client:68.193.49.9',)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/djangoblog/network/tasks.py", line 40, in twitter
n_twitter.GetTweetsTwitter(user)
File "/djangoblog/network/twitter.py", line 255, in GetTweetsTwitter
id = SaveTweet(user, network, tweet)
File "/djangoblog/network/twitter.py", line 150, in SaveTweet
if mmo.Moment.objects(user=user.id,source_id=id,network=network.id).count() == 0:
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 933, in count
return self._cursor.count(with_limit_and_skip=True)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 563, in _cursor
self._cursor_obj = self._collection.find(self._query,
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 493, in _collection
if self._collection_obj.name not in db.collection_names():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/database.py", line 361, in collection_names
names = [r["name"] for r in results]
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 703, in next
if len(self.__data) or self._refresh():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 666, in _refresh
self.__uuid_subtype))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 628, in __send_message self.__tz_aware)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/helpers.py", line 101, in _unpack_response error_object["$err"])
OperationFailure: database error: unauthorized db:data lock type:-1 client:68.193.49.9
Sorry for the formatting but if you look at the line that starts with mmo.Moment there's a connection being opened right before that's called.
Doing a bit of research it looks as if it has something to do with the way threading is handled in PyMongo - http://api.mongodb.org/python/1.5.1/faq.html#how-does-connection-pooling-work-in-pymongo - I may need to start closing the connections but I'd expect MongoEngine to be doing this..
This is likely due to the fact that you are not calling db.authenticate() when you start the new connection and are using auth on MongoDB.
Regarding the closing of threads, I would recommend making sure you are using connection pooling and letting the driver manage the pools (calling close() or similar manually can lead to a lot of pain).
For more info see the note in the pymongo documentation about using authenticate() in a multi-threaded environment.