I've been running my Glue Jobs on a schedule for a few months. Last night my Glue Job failed due to botocore.exceptions.NoCredentialsError: Unable to locate credentials after calling bucket.objects.filter(Prefix=productionDirectory):
I am under the impression this is a result of not having defined a credentials file, but AWS Glue has always pulled credentials without issue. I just re-ran my job and everything worked perfectly. For reference, I define my Glue Client via: glue = boto3.client('glue'). Has anyone ever experienced this before? Is this just an edge-case?
Full Logs:
Traceback (most recent call last):
File "/tmp/data-deployment", line 67, in <module>
for obj in bucket.objects.filter(Prefix=productionDirectory):
File "/home/spark/.local/lib/python3.7/site-packages/boto3/resources/collection.py", line 83, in __iter__
for page in self.pages():
File "/home/spark/.local/lib/python3.7/site-packages/boto3/resources/collection.py", line 166, in pages
for page in pages:
File "/home/spark/.local/lib/python3.7/site-packages/botocore/paginate.py", line 255, in __iter__
response = self._make_request(current_kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/paginate.py", line 332, in _make_request
return self._method(**current_kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 613, in _make_api_call
operation_model, request_dict, request_context)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 632, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 116, in create_request
operation_name=operation_model.name)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Edit/Update: This is a known bug. I've posted the mitigation strategy provided from AWS as an answer below.
Update: I reached out to AWS via Support and they responded. Apparently this is a known bug and issue. While they do not have a solution or ETA for solution, they do have a way to mitigate the issue. Information below:
Thank you for reporting your issue to us and product team is aware of this intermittent issue.
They are working on resolution however, I do not have an ETA.
To mitigate this issue, increase the timeout / attempts to meta service request in your code:
####START######
import os
####Increase meta service timeout and attempt########
os.environ['AWS_METADATA_SERVICE_NUM_ATTEMPTS'] ="5"
os.environ['AWS_METADATA_SERVICE_TIMEOUT'] ="30"
#####################END#################
I faced a similar issue with Glue, but not exactly the same.
We used external tables with SparkSQL and S3, and sometimes an Exception was raised out of nowhere, i.e. Table not found. The issue was never reproduced on testing and had least frequency. Since our jobs ran perfectly fine on retries, we enabled the retry mechanism to solve it.
It has something to do with the internal workings of Glue and its serverless environment.
Related
I'm trying to put to work AWS's Textract export table suggestion in this link
I'm a complete newbie in AWS's solutions and in command prompt so I'm trying to do exactly as they suggest. I'm running that in python so I'm using this piece of code:
import os
k=os.system("python textract_python_table_parser.py my_pdf_file_path.pdf")
print(k)
The code runs, I get an Image loaded my_pdf_file_path.pdf however at some point it bugs on credential matters:
Traceback (most recent call last):
File "/Users/santanna_santanna/PycharmProjects/KlooksExplore/PDFWork/textract_python_table_parser.py", line 108, in <module>
main(file_name)
File "/Users/santanna_santanna/PycharmProjects/KlooksExplore/PDFWork/textract_python_table_parser.py", line 94, in main
table_csv = get_table_csv_results(file_name)
File "/Users/santanna_santanna/PycharmProjects/KlooksExplore/PDFWork/textract_python_table_parser.py", line 53, in get_table_csv_results
response = client.analyze_document(Document={'Bytes': bytes_test}, FeatureTypes=['TABLES'])
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/client.py", line 622, in _make_api_call
operation_model, request_dict, request_context)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/client.py", line 641, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/endpoint.py", line 116, in create_request
operation_name=operation_model.name)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
I'm aware I didn't pass any credentials and that's natural to happen but where should I pass it and what would be the right syntax for that using python os? Amazon's example doesn't say anything about that.
It depends where you run your code, for example:
local computer - can use aws configure CLI to set your credetnails
EC2 instance - use instance role
lambda function - use lambda execution role
We have a Glue Python Shell Pipeline Job trying to invoke AWS AppFlow through boto3 APIs. The challenge we have is that we need to design our solution without the need for internet access within Glue
Previously we had challenges in upgrading the boto3 APIs in AWS Glue Shell to the latest version without internet. But then we have found the solution and documented in the following link. In our architecture, we are using AWS Python Shell as our lightweight Datapipeline Engine leveraging boto3 APIs
Git Glue Boto3 Bug & Solution
The following Appflow API python code is working perfectly fine in our local Jupyter Notebooks, as AWS App flow API is invoked over the internet.
##Extra code as per above link to update boto3 version as well, which is omitted
import boto3
print(boto3.__version__)
client=boto3.client('appflow')
response=client.describe_flow(
flowName='sf_non_pii'
)
But while we use the same code in AWS Glue, it is failing as it is not able to access the internet.
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://appflow.us-east-2.amazonaws.com/describe-flow"
Please find the logs below
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/botocore/httpsession.py", line 263, in send
chunked=self._chunked(request.headers),
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 386, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 735, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 309, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 167, in _new_conn
% (self.host, self.timeout),
urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f5f8802c0f0>, 'Connection to appflow.us-east-2.amazonaws.com timed out. (connect timeout=60)')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/runscript.py", line 123, in <module>
runpy.run_path(temp_file_path, run_name='__main__')
File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmp/glue-python-scripts-a3cgmro3/ref_appflow.py", line 44, in <module>
File "/tmp/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/tmp/botocore/client.py", line 663, in _make_api_call
operation_model, request_dict, request_context)
File "/tmp/botocore/client.py", line 682, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/tmp/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/tmp/botocore/endpoint.py", line 137, in _send_request
success_response, exception):
File "/tmp/botocore/endpoint.py", line 256, in _needs_retry
caught_exception=caught_exception, request_dict=request_dict)
File "/tmp/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/tmp/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/tmp/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/tmp/botocore/retryhandler.py", line 183, in __call__
if self._checker(attempts, response, caught_exception):
File "/tmp/botocore/retryhandler.py", line 251, in __call__
caught_exception)
File "/tmp/botocore/retryhandler.py", line 277, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File "/tmp/botocore/retryhandler.py", line 317, in __call__
caught_exception)
File "/tmp/botocore/retryhandler.py", line 223, in __call__
attempt_number, caught_exception)
File "/tmp/botocore/retryhandler.py", line 359, in _check_caught_exception
raise caught_exception
File "/tmp/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/tmp/botocore/endpoint.py", line 269, in _send
return self.http_session.send(request)
File "/tmp/botocore/httpsession.py", line 287, in send
raise ConnectTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://appflow.us-east-2.amazonaws.com/describe-flow"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/runscript.py", line 142, in <module>
raise e_type(e_value).with_traceback(new_stack)
TypeError: __init__() takes 1 positional argument but 2 were given
We had faced similar issues for Redshift, AWS Athena, SSM, etc. But for these services, we have VPC endpoints available. Creating the endpoints enabled our glue job to connect privately within the VPC. Please refer to the link below for supported products
AWS services that you can use with AWS PrivateLink
But since Appflow is a newer product, we do not have endpoint, even though some unclear documentation regarding a throwaway create and delete VPC endpoint as below
Private Amazon AppFlow flows
Please let us know how we can invoke boto3 APIs for Appflow without going through the internet.
I am trying to setup dynamic thumbnail service thumbor and to support s3 as storage, I need to setup this community powered pip library for aws.
Its working well on my local environment but when I am trying to host it on one of our servers, I am getting NoCredentialsError. I am assuming this is because of difference versions of botocore (latest one and one installed by pip library). Here is error log:
File "/usr/local/lib/python2.7/dist-packages/botocore/session.py", line 774, in get_component
# client config from the session
File "/usr/local/lib/python2.7/dist-packages/botocore/session.py", line 174, in <lambda>
self._components.lazy_register_component(
File "/usr/local/lib/python2.7/dist-packages/botocore/session.py", line 453, in get_data
- agent_version is the value of the `user_agent_version`
File "/usr/local/lib/python2.7/dist-packages/botocore/loaders.py", line 119, in _wrapper
data = func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/botocore/loaders.py", line 364, in load_data
DataNotFoundError: Unable to load data for: _endpoints
2016-04-24 12:14:34 tornado.application:ERROR Future exception was never retrieved: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 230, in wrapper
yielded = next(result)
File "/usr/local/lib/python2.7/dist-packages/thumbor/handlers/imaging.py", line 31, in check_image
exists = yield gen.maybe_future(self.context.modules.storage.exists(kw['image'][:self.context.config.MAX_ID_LENGTH]))
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 455, in wrapper
future.result()
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 443, in wrapper
result = f(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tc_aws/aws/storage.py", line 107, in exists
self.storage.get(file_abspath, callback=return_data)
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 455, in wrapper
future.result()
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 443, in wrapper
result = f(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tc_aws/aws/bucket.py", line 44, in get
Key=self._clean_key(path),
File "/usr/local/lib/python2.7/dist-packages/tornado_botocore/base.py", line 97, in call
return self._make_api_call(operation_name=self.operation, api_params=kwargs, callback=callback)
File "/usr/local/lib/python2.7/dist-packages/tornado_botocore/base.py", line 60, in _make_api_call
operation_model=operation_model, request_dict=request_dict, callback=callback)
File "/usr/local/lib/python2.7/dist-packages/tornado_botocore/base.py", line 54, in _make_request
request_dict=request_dict, operation_model=operation_model, callback=callback)
File "/usr/local/lib/python2.7/dist-packages/tornado_botocore/base.py", line 32, in _send_request
request = self.endpoint.create_request(request_dict, operation_model)
File "/usr/local/lib/python2.7/dist-packages/botocore/endpoint.py", line 126, in create_request
operation_name=operation_model.name)
File "/usr/local/lib/python2.7/dist-packages/botocore/hooks.py", line 226, in emit
return self._emit(event_name, kwargs)
File "/usr/local/lib/python2.7/dist-packages/botocore/hooks.py", line 209, in _emit
response = handler(**kwargs)
File "/usr/local/lib/python2.7/dist-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/usr/local/lib/python2.7/dist-packages/botocore/signers.py", line 124, in sign
signer.add_auth(request=request)
File "/usr/local/lib/python2.7/dist-packages/botocore/auth.py", line 626, in add_auth
raise NoCredentialsError
NoCredentialsError: Unable to locate credentials
Could it be fixed with proper ordering in which I install libraries? Because the pip library removes existing newer version of botocore and installs an older version.
EDIT:
I am running processes with supervisor and it seems process cant access aws credentials
EDIT 2:
The issue got resolved with proper configuration of supervisor. The user for process started by supervisor did not have access to config file
The issue got resolved with proper configuration of supervisor. The user for subprocess started by supervisor did not have access to aws config file. So it was working with local environment or creating process separately but not with supervisor.
I am using app engine and big query as the backend for my website. Whenever the user does some click, i log them into bigquery to do analytics later in the day. I get close to 75k clicks a day. It was working fine till last week. This is the code i use.
body = {"rows":[bodyFields]}
credentials = appengine.AppAssertionCredentials(scope=BIGQUERY_SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery = discovery.build('bigquery', 'v2', http=http)
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=BIGQUERY_DATASETID,
tableId=BIGQUERY_TABLEID,
body=body).execute()
Now all of a sudden i am getting over quota exception. My application is a paid app engine instance. Below is the stack-trace of my exception
Attempting refresh to obtain initial access_token
The API call app_identity_service.GetAccessToken() required more quota than is available.
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/filename.py", line 1611, in post
bigquery = discovery.build('bigquery', 'v2', http=http)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/util.py", line 129, in positional_wrapper
return wrapped(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/apiclient/discovery.py", line 198, in build
resp, content = http.request(requested_url)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/util.py", line 129, in positional_wrapper
return wrapped(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/client.py", line 516, in new_request
self._refresh(request_orig)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/appengine.py", line 194, in _refresh
scopes, service_account_id=self.service_account_id)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 589, in get_access_token
scopes, service_account_id=service_account_id)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 547, in get_access_token_uncached
return rpc.get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 519, in get_access_token_result
rpc.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
self.__rpc.CheckSuccess()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 134, in CheckSuccess
raise self.exception
OverQuotaError: The API call app_identity_service.GetAccessToken() required more quota than is available.
My traffic hasn't gone up by much also the number of time the handler is hit is almost the same as past 2 months data. So why am i getting this error.
In order to determine why you're hitting the quota error, you'll need to share more detail about your usage. The quota should reset every 24 hours. Do you know how long it takes for the error to appear and how much traffic you've successfully served to that point in time?
You mentioned that you "do analytics later in the day", which suggests that you might be using the TaskQueue API or Deferred Tasks. It's possible that those tasks are failing for other reasons and retrying, which could quickly eat up your quota. If you are using TaskQueues, you might try tuning the queue configuration and retry options.
Another way you might be able to conserve your quota would be to save the bigquery discovery service that you're building to something like the Memcache API, so that it can be reused for multiple requests to the BigQuery service.
I'm using a Tornado server with tornado-botocore to connect to Amazon SQS services.
When running stress tests we sometimes get the following exception:
Traceback (most recent call last):
File "/home/app/handlers/WebSocketsHandler.py", line 95, in listen_outgoing_queue
message = yield tornado.gen.Task(self.outgoing_queue.read)
File "/home/local/lib/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/home/local/lib/python2.7/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/home/local/lib/python2.7/site-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/home/local/lib/python2.7/site-packages/tornado_botocore/base.py", line 70, in prepare_response
response_dict, operation_model.output_shape)
File "/home/local/lib/python2.7/site-packages/botocore/parsers.py", line 155, in parse
return self._do_error_parse(response, shape)
File "/home/.env/local/lib/python2.7/site-packages/botocore/parsers.py", line 314, in _do_error_parse
root = self._parse_xml_string_to_dom(xml_contents)
File "/home/local/lib/python2.7/site-packages/botocore/parsers.py", line 274, in _parse_xml_string_to_dom
parser.feed(xml_string)
TypeError: must be string or read-only buffer, not None
could it be caused by the concurrency?
has anyone encountered such behavior?
We are using tornado 4.2.1, botocore 0.65.0 and tonado-botocore 0.1.6
problem solved once i removed the #tornado.gen.engine decorator from the method