I'm trying to use s3fs in python to connect to an s3 bucket. The associated credentials are saved in a profile called 'pete' in ~/.aws/credentials:
[default]
aws_access_key_id=****
aws_secret_access_key=****
[pete]
aws_access_key_id=****
aws_secret_access_key=****
This seems to work in AWS CLI (on Windows):
$>aws s3 ls s3://my-bucket/ --profile pete
PRE other-test-folder/
PRE test-folder/
But I get a permission denied error when I use what should be equivalent code using the s3fs package in python:
import s3fs
import requests
s3 = s3fs.core.S3FileSystem(profile = 'pete')
s3.ls('my-bucket')
I get this error:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 504, in _lsdir
async for i in it:
File "C:\ProgramData\Anaconda3\lib\site-packages\aiobotocore\paginate.py", line 32, in __anext__
response = await self._make_request(current_kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\aiobotocore\client.py", line 154, in _make_api_call
raise error_class(parsed_response, operation_name)
ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<ipython-input-9-4627a44a7ac3>", line 5, in <module>
s3.ls('ma-baseball')
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 993, in ls
files = maybe_sync(self._ls, self, path, refresh=refresh)
File "C:\ProgramData\Anaconda3\lib\site-packages\fsspec\asyn.py", line 97, in maybe_sync
return sync(loop, func, *args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\fsspec\asyn.py", line 68, in sync
raise exc.with_traceback(tb)
File "C:\ProgramData\Anaconda3\lib\site-packages\fsspec\asyn.py", line 52, in f
result[0] = await future
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 676, in _ls
return await self._lsdir(path, refresh)
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 527, in _lsdir
raise translate_boto_error(e) from e
PermissionError: Access Denied
I have to assume it's not a config issue within s3 because I can access s3 through the CLI. So something must be off with my s3fs code, but I can't find a whole lot of documentation on profiles in s3fs to figure out what's going on. Any help is of course appreciated.
Related
My s3 bucket and AWS Rekognition model are both in us-east-1. My lambda which is also in us-east-1 is triggered by an upload into my s3 bucket. I pasted the auto generated code from the model (for python) and used it in my lambda function. I have even tried giving full access to my s3 bucket (allowing public access with full permission) but when I trigger the lambda I get this exception
[ERROR] InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the DetectCustomLabels operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 80, in lambda_handler
label_count=show_custom_labels(model,bucket,photo, min_confidence)
File "/var/task/lambda_function.py", line 59, in show_custom_labels
response = client.detect_custom_labels(Image={'S3Object': {'Bucket': bucket, 'Name': photo}},
File "/var/runtime/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 676, in _make_api_call
raise error_class(parsed_response, operation_name)
I am printing out my bucket name and key in the logs and they are fine. My key has a folder path included (folder1/folder2/image.jpg).
How can I get over this error?
I have a simple pipeline, which 3 weeks ago would work fine but I've gone back to the code to enhance it and when I tried to run the code it returned the following error:
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 10.280192286716229 seconds before retrying exists because we caught exception: TypeError: string indices must be integers
Traceback for above exception (most recent call last):
I am running the dataflow script via Cloud Shell on the Google Cloud Platform. By simply executing Python3 <dataflow.py>
The code is as follows, and this used to submit the job to dataflow without issue
import json
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions, StandardOptions
from apache_beam import coders
pipeline_args=[
'--runner=DataflowRunner',
'--job_name=my-job-name',
'--project=my-project-id',
'--region=europe-west2',
'--temp_location=gs://mybucket/temp',
'--staging_location=gs://mybucket/staging'
]
options = PipelineOptions (pipeline_args)
p = beam.Pipeline(options=options)
rows = (
p | 'Read daily Spot File' >> beam.io.ReadFromText(
file_pattern='gs://bucket/filename.gz',
compression_type='gzip',
coder=coders.BytesCoder(),
skip_header_lines=0))
p.run()
Any advice as to why this has started happening would be great to know.
Thanks in advance.
Not sure about the original issue but I can speak to Usman's post which seems to describe an issue I ran into myself.
Python doesn't use gcloud auth to authenticate but it uses the environment variable GOOGLE_APPLICATION_CREDENTIALS. So before you run the python command to launch the Dataflow job, you will need to set that environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key"
More info on setting up the environment variable: https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable
Then you'll have to make sure that the account you set up has the necessary permissions in your GCP project.
Permissions and service accounts:
User service account or user account: it needs the Dataflow Admin
role at the project level and to be able to act as the worker service
account (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#worker_service_account).
Worker service account: it will be one worker service account per
Dataflow pipeline. This account will need the Dataflow Worker role at
the project level plus the necessary permissions to the resources
accessed by the Dataflow pipeline (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#worker_service_account).
Example: if Dataflow pipeline’s input is Pub/Sub topic and output is
BigQuery table, the worker service account will need read access to
the topic as well as write permission to the BQ table.
Dataflow service account: this is the account that gets automatically
created when you enable the Dataflow API in a project. It
automatically gets the Dataflow Service Agent role at the project
level (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#service_account).
I found same kind of error following this [tutorial][1]
python3 PubSubToGCS.py \
--project=mango-s-11-a81277cf \
--region=us-east1 \
--input_topic=projects/mango-s-11-a81277cf/topics/pubsubtest \
--output_path=gs://mangobucket001/samples/output \
--runner=DataflowRunner \
--window_size=2 \
--num_shards=2 \
--temp_location=gs://mangobucket001/temp
a81277cf/topics/pubsubtest --output_path=gs://mangobucket001/ --runner=DataflowRunner --window_size=2 --num_shards=2 --temp_location=gs://mangobucket001/temp
INFO:apache_beam.runners.portability.stager:Downloading source distribution of the SDK from PyPi
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/bin/python3', '-m', 'pip', 'download', '--dest', '/tmp/tmpaelpxw64', 'apache-beam==2.33.0', '--no-deps', '--no-binary', ':all:']
INFO:apache_beam.runners.portability.stager:Staging SDK sources from PyPI: dataflow_python_sdk.tar
INFO:apache_beam.runners.portability.stager:Downloading binary distribution of the SDK from PyPi
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/bin/python3', '-m', 'pip', 'download', '--dest', '/tmp/tmpaelpxw64', 'apache-beam==2.33.0', '--no-deps', '--only-binary', ':all:', '--python-version', '38', '--implementation', 'cp', '--abi', 'cp38', '--platform', 'manylinux1_x86_64']
INFO:apache_beam.runners.portability.stager:Staging binary distribution of the SDK from PyPI: apache_beam-2.33.0-cp38-cp38-manylinux1_x86_64.whl
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.8 interpreter.
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.33.0
INFO:root:Using provided Python SDK container image: gcr.io/cloud-dataflow/v1beta3/python38-fnapi:2.33.0
INFO:root:Python SDK container image set to "gcr.io/cloud-dataflow/v1beta3/python38-fnapi:2.33.0" for Docker environment
INFO:apache_beam.runners.dataflow.internal.apiclient:Defaulting to the temp_location as staging_location: gs://mangobucket001/temp
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 3.054787201157167 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 5.22755214575824 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 19.20582351652022 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 39.51226214946517 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 61.15550203875504 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 122.46460946846845 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
[1]: https://cloud.google.com/pubsub/docs/pubsub-dataflow
When I try to run very simple Python script to get object from s3 bucket:
import boto3
s3 = boto3.resource('s3',
region_name="eu-east-1",
verify=False,
aws_access_key_id="QxxxxxxxxxxxxxxxxxxxxxxxxFY=",
aws_secret_access_key="c1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxYw==")
obj = s3.Object('3gxxxxxxxxxxs7', 'dk5xxxxxxxxxxn94')
result = obj.get()['Body'].read().decode('utf-8')
print(result)
I got an error:
$ python3 script.py
Traceback (most recent call last):
File "script.py", line 7, in <module>
result = obj.get()['Body'].read().decode('utf-8')
File "//anaconda3/lib/python3.7/site-packages/boto3/resources/factory.py", line 520, in do_action
response = action(self, *args, **kwargs)
File "//anaconda3/lib/python3.7/site-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(**params)
File "//anaconda3/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "//anaconda3/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError:
An error occurred (AuthorizationHeaderMalformed)
when calling the GetObject operation:
The authorization header is malformed; the authorization component
"Credential=QUtJxxxxxxxxxxxxxxxxxlZPUFY=/20191005/us-east-1/s3/aws4_request"
is malformed.
I'm not sure what can be causing it, worth adding that:
I don't know what is the bucket region (don't ask why) but I tried manually to connect to all of them (by changing default region name in command to every region) and without success.
I don't have access to bucket configuration. And anything that is in aws console. I just have the Key ID, Secret, bucket name and object name.
An AWS-Access-Key-ID always begins with AKIA for IAM users or ASIA for temporary credentials from Security Token Service, as noted in IAM Identifiers in the AWS Identity and Access Management User Guide.
The value you're using does not appear to be one of these, since it starts with QUtJ... so this it isn't the value you should be using here. You appear to be using something that isn't an AWS-Access-Key-ID.
Not long time ago I had similar problem because I am 90% sure it a task from recruitment interview ;)
For all future travelers trying to recruit to this company: this credentials are encrypted, unfortunetly I forgot encryption type but surely a very common one.
I would like to ask if it is currently possible to use spark-ec2 script https://spark.apache.org/docs/latest/ec2-scripts.html together with credentials that are consisting not only from: aws_access_key_id and aws_secret_access_key, but it also contains aws_security_token.
When I try to run the script I am getting following error message:
ERROR:boto:Caught exception reading instance data
Traceback (most recent call last):
File "/Users/zikes/opensource/spark/ec2/lib/boto-2.34.0/boto/utils.py", line 210, in retry_url
r = opener.open(req, timeout=timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 64] Host is down>
ERROR:boto:Unable to read instance data, giving up
No handler was ready to authenticate. 1 handlers were checked. ['QuerySignatureV2AuthHandler'] Check your credentials
Does anyone has some idea what can be possibly wrong? Is aws_security_token the problem?
It maybe seems to me more as boto than Spark problem.
I have tried both:
1) setting credentials in ~/.aws/credentials and ~/.aws/config
2) setting credential by commands:
export aws_access_key_id=<my_aws_access_key>
export aws_secret_access_key=<my_aws_seecret_key>
export aws_security_token=<my_aws_security_token>
My launch command is:
./spark-ec2 -k my_key -i my_key.pem --additional-tags "mytag:tag1,mytag2:tag2" --instance-profile-name "profile1" -s 1 launch test
you can setup your credentials & config using the command aws configure.
I had the same issue but in my case my AWS_SECRET_ACCESS_KEY had a slash, I regenerated the key until there was no slash and it worked
The problem was that I did not use profile called default after renaming everything worked well.
i Created new config file:
$ sudo vi ~/.boto
there i paste my credentials (as written in readthedocs for botp):
[Credentials]
aws_access_key_id = YOURACCESSKEY
aws_secret_access_key = YOURSECRETKEY
im trying to check connection:
import boto
boto.set_stream_logger('boto')
s3 = boto.connect_s3("us-east-1")
and my answer:
2014-11-26 14:05:49,532 boto [DEBUG]:Using access key provided by client.
2014-11-26 14:05:49,532 boto [DEBUG]:Retrieving credentials from metadata server.
2014-11-26 14:05:50,539 boto [ERROR]:Caught exception reading instance data
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/boto/utils.py", line 210, in retry_url
r = opener.open(req, timeout=timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
URLError: <urlopen error timed out>
2014-11-26 14:05:50,540 boto [ERROR]:Unable to read instance data, giving up
Traceback (most recent call last):
File "/Users/user/PycharmProjects/project/untitled.py", line 8, in <module>
s3 = boto.connect_s3("us-east-1")
File "/Library/Python/2.7/site-packages/boto/__init__.py", line 141, in connect_s3
return S3Connection(aws_access_key_id, aws_secret_access_key, **kwargs)
File "/Library/Python/2.7/site-packages/boto/s3/connection.py", line 190, in __init__
validate_certs=validate_certs, profile_name=profile_name)
File "/Library/Python/2.7/site-packages/boto/connection.py", line 569, in __init__
host, config, self.provider, self._required_auth_capability())
File "/Library/Python/2.7/site-packages/boto/auth.py", line 975, in get_auth_handler
'Check your credentials' % (len(names), str(names)))
boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials
why its not found the Credentials?
there is something that i did wrong?
Your issue is:
The string 'us-west-1' you provide as the first argument will be treat as the AWSAccessKeyID.
What you want is:
First creating a connection, note that a connection has no region or location info in it.
conn = boto.connect_s3('your_access_key', 'your_secret_key')
And then when you want to do some thing with the bucket, write the region info as an argument.
from boto.s3.connection import Location
conn.create_bucket('mybucket', location=Location.USWest)
or:
conn.create_bucket('mybucket', location='us-west-1')
By default, the location is the empty string which is interpreted as the US Classic Region, the original S3 region. However, by specifying another location at the time the bucket is created, you can instruct S3 to create the bucket in that location.