Google Cloud Function pulling from Pub/Sub subscription throws exception - Deadline Exceeded - google-cloud-platform

I have a Google Cloud Function in Python 3.7 reading from a Pub/Sub subscription in synchronous pull mode.
After running fine 1/hour for 24 hours, it threw this exception stack trace:
Traceback (most recent call last): File
"/env/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py",
line 57, in error_remapped_callable
return callable_(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/grpc/_channel.py", line 824,
in call
return _end_unary_response_blocking(state, call, False, None) File "/env/local/lib/python3.7/site-packages/grpc/_channel.py", line
726, in _end_unary_response_blocking
raise _InactiveRpcError(state) grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status =
StatusCode.DEADLINE_EXCEEDED details = "Deadline Exceeded"
debug_error_string =
"{"created":"#1580454091.145703535","description":"Error received from
peer
ipv4:74.125.202.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Deadline
Exceeded","grpc_status":4}"
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File
"/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py",
line 346, in run_http_function
result = _function_handler.invoke_user_function(flask.request) File
"/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py",
line 217, in invoke_user_function
return call_user_function(request_or_event) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py",
line 210, in call_user_function
return self._user_function(request_or_event) File "/user_code/main.py", line 39, in iteration
response = sub.pull(sub_path, MAX_MESSAGES) File "/env/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/_gapic.py",
line 40, in
fx = lambda self, *a, **kw: wrapped_fx(self.api, *a, **kw) # noqa File
"/env/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/gapic/subscriber_client.py",
line 1005, in pull
request, retry=retry, timeout=timeout, metadata=metadata File "/env/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py",
line 143, in call
return wrapped_func(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/google/api_core/retry.py",
line 286, in retry_wrapped_func
on_error=on_error, File "/env/local/lib/python3.7/site-packages/google/api_core/retry.py",
line 184, in retry_target
return target() File "/env/local/lib/python3.7/site-packages/google/api_core/timeout.py",
line 214, in func_with_timeout
return func(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py",
line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc) File "", line 3, in raise_from
google.api_core.exceptions.DeadlineExceeded: 504 Deadline Exceeded
What is this about? Is it to be expected or a result of some configuration problem? If to be expected, how should it be handled?
The documentation ( view-source:https://googleapis.dev/python/pubsub/latest/subscriber/api/client.html ) on pull has nothing about this being a possible exception.
I ack the messages immediately after the pull completes. I only permit one function execution at a time. I have a 600 second acknowledgement deadline. A block of messages pulled at one time seem to be less than 100 in number. If this is about failing to ack a message, it seems like the error could be done much better.

This exception is raised by the client when there's no messages to read in the subscription. It is a known issue from the latest PubSub library versions >= 1.0.0. If necessary, you can downgrade to the version 0.45.0 where this issue was not present.
However, as a workaround you can catch the DeadlineExceeded exception and retry the operation again. Also, based on the comment of Hemang, here's a small monkeypatch that you can add to your running code, which might help to get the same behavior as in version 0.45.0.
from google.cloud.pubsub_v1.gapic import subscriber_client_config as sub_config
sub_config.config['interfaces']['google.pubsub.v1.Subscriber']['retry_params']['messaging']['initial_rpc_timeout_millis'] = 25000
Finally, keep in mind that when using synchronous pull, having many outstanding pull requests helps lower the delivery latency, which in turn might result in higher latency pull requests (and DeadlineExceeded errors). Although, if latency is crucial for the application, you could consider using StreamingPull

Related

Error "No worker is available to serve request: model" when calling SageMaker endpoint during increaesd load

I have a custom container that takes a request, does some feature extraction and then passes on the enhanced request to a classifier endpoint. During feature extraction another endpoint is being called for generating text embeddings. I am using the HuggingFace estimator for my embedding model.
It has been working fine, but there was an increase in requests and looks like the embedding endpoint timed out somehow.
I am looking at adding automatic scaling to the endpoint, but I want to make sure I understand what is happening and that it properly addresses the issue. Unfortunately searching for this error message does not pull up much. The instance metrics is not showing the endpoint to be overloaded - cpu utilization was max ~30%. Would auto scaling address the no worker issue or is this something different? I was receiving a few hundred requests per minute at the time.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/opt/program/predictor.py", line 56, in transformation
result = preprocessor.transform(data)
File "/opt/program/preprocessor.py", line 189, in transform
response = embed_predictor.predict(data=json.dumps(payload))
File "/usr/local/lib/python3.7/site-packages/sagemaker/predictor.py", line 136, in predict
response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (503) from primary with message "{
"code": 503,
"type": "ServiceUnavailableException",
"message": "No worker is available to serve request: model"
}
I would suggest confirming MemoryUtilization is not being overwhelmed and there is no specifc error in CloudWatch Logs as well.
If MemoryUtilization is overwhelmed, you can test configuring Auto Scaling in order to distribute the load of request to multiple instances. That being said, while I am not sure of the details of your custom container, I also recommend confirming the container itself can handle multiple concurrent requests (i.e have multiple workers available to serve requests).

Request Google cloud speech-to-text API gives 503 error?

I ran the following python sample code on this page (you need a private key to run this): https://cloud.google.com/speech-to-text/docs/quickstart-client-libraries
# Imports the Google Cloud client library
from google.cloud import speech
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'key.json'
# Instantiates a client
client = speech.SpeechClient()
# The name of the audio file to transcribe
gcs_uri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw"
audio = speech.RecognitionAudio(uri=gcs_uri)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
# Detects speech in the audio file
response = client.recognize(config=config, audio=audio)
for result in response.results:
print("Transcript: {}".format(result.alternatives[0].transcript))
This results in the following error:
Traceback (most recent call last):
File "C:\Users\98274\anaconda3\envs\carla\lib\site-packages\google\api_core\grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "C:\Users\98274\anaconda3\envs\carla\lib\site-packages\grpc\_channel.py", line 826, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "C:\Users\98274\anaconda3\envs\carla\lib\site-packages\grpc\_channel.py", line 729, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"#1614347442.192000000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":4090,"referenced_errors":[{"created":"#1614347442.192000000","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":394,"grpc_status":14}]}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\98274\anaconda3\envs\carla\lib\site-packages\google\api_core\retry.py", line 184, in retry_target
return target()
File "C:\Users\98274\anaconda3\envs\carla\lib\site-packages\google\api_core\grpc_helpers.py", line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.ServiceUnavailable: 503 failed to connect to all addresses
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/Users/98274/Desktop/carla_ppo/test_connection.py", line 22, in <module>
response = client.recognize(config=config, audio=audio)
File "C:\Users\98274\anaconda3\envs\carla\lib\site-packages\google\cloud\speech_v1\services\speech\client.py", line 334, in recognize
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "C:\Users\98274\anaconda3\envs\carla\lib\site-packages\google\api_core\gapic_v1\method.py", line 145, in __call__
return wrapped_func(*args, **kwargs)
File "C:\Users\98274\anaconda3\envs\carla\lib\site-packages\google\api_core\retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "C:\Users\98274\anaconda3\envs\carla\lib\site-packages\google\api_core\retry.py", line 206, in retry_target
last_exc,
File "<string>", line 3, in raise_from
google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<function _wrap_unary_errors.<locals>.error_remapped_callable at 0x0000026B56606A68>, config {
encoding: LINEAR16
sample_rate_hertz: 16000
language_code: "en-US"
}
audio {
uri: "gs://cloud-samples-data/speech/brooklyn_bridge.raw"
}
, metadata=[('x-goog-api-client', 'gl-python/3.7.9 grpc/1.31.0 gax/1.23.0 gapic/2.0.1')]), last exception: 503 failed to connect to all addresses
What's the problem? Does it related to internet issue because I am in China? How can I solve it?
I am posting the answer as CommunityWiki to further contribuite to the community.
As discussed in the comment section, the code provided in the documentation is running smoothly. However, you have to make sure that you set your credentials using GOOGLE_APPLICATION_CREDENTIALS so your request is authenticated. Also,the client libraries have to be properly installed.
In #antaressgzz case's the request to the API was being blocked by a firewall rule. Thus, it is a good practice to check these rules, changing them if applicable.

app_identity_service.GetAccessToken() required more quota than is available

I am using app engine and big query as the backend for my website. Whenever the user does some click, i log them into bigquery to do analytics later in the day. I get close to 75k clicks a day. It was working fine till last week. This is the code i use.
body = {"rows":[bodyFields]}
credentials = appengine.AppAssertionCredentials(scope=BIGQUERY_SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery = discovery.build('bigquery', 'v2', http=http)
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=BIGQUERY_DATASETID,
tableId=BIGQUERY_TABLEID,
body=body).execute()
Now all of a sudden i am getting over quota exception. My application is a paid app engine instance. Below is the stack-trace of my exception
Attempting refresh to obtain initial access_token
The API call app_identity_service.GetAccessToken() required more quota than is available.
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/filename.py", line 1611, in post
bigquery = discovery.build('bigquery', 'v2', http=http)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/util.py", line 129, in positional_wrapper
return wrapped(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/apiclient/discovery.py", line 198, in build
resp, content = http.request(requested_url)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/util.py", line 129, in positional_wrapper
return wrapped(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/client.py", line 516, in new_request
self._refresh(request_orig)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/appengine.py", line 194, in _refresh
scopes, service_account_id=self.service_account_id)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 589, in get_access_token
scopes, service_account_id=service_account_id)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 547, in get_access_token_uncached
return rpc.get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 519, in get_access_token_result
rpc.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
self.__rpc.CheckSuccess()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 134, in CheckSuccess
raise self.exception
OverQuotaError: The API call app_identity_service.GetAccessToken() required more quota than is available.
My traffic hasn't gone up by much also the number of time the handler is hit is almost the same as past 2 months data. So why am i getting this error.
In order to determine why you're hitting the quota error, you'll need to share more detail about your usage. The quota should reset every 24 hours. Do you know how long it takes for the error to appear and how much traffic you've successfully served to that point in time?
You mentioned that you "do analytics later in the day", which suggests that you might be using the TaskQueue API or Deferred Tasks. It's possible that those tasks are failing for other reasons and retrying, which could quickly eat up your quota. If you are using TaskQueues, you might try tuning the queue configuration and retry options.
Another way you might be able to conserve your quota would be to save the bigquery discovery service that you're building to something like the Memcache API, so that it can be reused for multiple requests to the BigQuery service.

TypeError when using botocore to read from AWS SQS queue

I'm using a Tornado server with tornado-botocore to connect to Amazon SQS services.
When running stress tests we sometimes get the following exception:
Traceback (most recent call last):
File "/home/app/handlers/WebSocketsHandler.py", line 95, in listen_outgoing_queue
message = yield tornado.gen.Task(self.outgoing_queue.read)
File "/home/local/lib/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/home/local/lib/python2.7/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/home/local/lib/python2.7/site-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/home/local/lib/python2.7/site-packages/tornado_botocore/base.py", line 70, in prepare_response
response_dict, operation_model.output_shape)
File "/home/local/lib/python2.7/site-packages/botocore/parsers.py", line 155, in parse
return self._do_error_parse(response, shape)
File "/home/.env/local/lib/python2.7/site-packages/botocore/parsers.py", line 314, in _do_error_parse
root = self._parse_xml_string_to_dom(xml_contents)
File "/home/local/lib/python2.7/site-packages/botocore/parsers.py", line 274, in _parse_xml_string_to_dom
parser.feed(xml_string)
TypeError: must be string or read-only buffer, not None
could it be caused by the concurrency?
has anyone encountered such behavior?
We are using tornado 4.2.1, botocore 0.65.0 and tonado-botocore 0.1.6
problem solved once i removed the #tornado.gen.engine decorator from the method

OperationFailure: database error when threading in MongoEngine/PyMongo

I have a function that will read data from a website, process it, and then load it into MongoDB. When I run this without threading it works fine but as soon as I set up celery tasks that just call this one function I frequently get the following error: "OperationFailure: database error: unauthorized db:dbname lock type:-1"
It's somewhat odd because if I run the non-celery version on multiple terminals, I do not get this error at all.
I suspect it has something to do with there not being an open connection to Mongo although in my code I'm opening one up right before every Mongo call.
The exact exception is below:
Task twitter[a974bfcc-d6ca-4baf-b36f-cae9143ce2d9] raised exception: OperationFailure(u'database error: unauthorized db:data lock type:-1 client:68.193.49.9',)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/djangoblog/network/tasks.py", line 40, in twitter
n_twitter.GetTweetsTwitter(user)
File "/djangoblog/network/twitter.py", line 255, in GetTweetsTwitter
id = SaveTweet(user, network, tweet)
File "/djangoblog/network/twitter.py", line 150, in SaveTweet
if mmo.Moment.objects(user=user.id,source_id=id,network=network.id).count() == 0:
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 933, in count
return self._cursor.count(with_limit_and_skip=True)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 563, in _cursor
self._cursor_obj = self._collection.find(self._query,
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 493, in _collection
if self._collection_obj.name not in db.collection_names():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/database.py", line 361, in collection_names
names = [r["name"] for r in results]
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 703, in next
if len(self.__data) or self._refresh():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 666, in _refresh
self.__uuid_subtype))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 628, in __send_message self.__tz_aware)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/helpers.py", line 101, in _unpack_response error_object["$err"])
OperationFailure: database error: unauthorized db:data lock type:-1 client:68.193.49.9
Sorry for the formatting but if you look at the line that starts with mmo.Moment there's a connection being opened right before that's called.
Doing a bit of research it looks as if it has something to do with the way threading is handled in PyMongo - http://api.mongodb.org/python/1.5.1/faq.html#how-does-connection-pooling-work-in-pymongo - I may need to start closing the connections but I'd expect MongoEngine to be doing this..
This is likely due to the fact that you are not calling db.authenticate() when you start the new connection and are using auth on MongoDB.
Regarding the closing of threads, I would recommend making sure you are using connection pooling and letting the driver manage the pools (calling close() or similar manually can lead to a lot of pain).
For more info see the note in the pymongo documentation about using authenticate() in a multi-threaded environment.