I have a Spark 2.4.7 running on GCP Dataproc and task to read some files from AWS S3.
Having AWS credentials, namely access key and secret access key, appended to files /etc/profile.d/spark_config.sh /etc/*bashrc /usr/lib/spark/conf/spark-env.sh at cluster creation, I can read SOME of the files from my S3 bucket, while reading all of them gives
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 274, in json
return self._df(self._jreader.json(self._spark._sc._jvm.PythonUtils.toSeq(path)))
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o202.json.
: java.nio.file.AccessDeniedException: s3a://my_bucket/part-00000-a5adf948-1f65-4068-ae68-76c3708b4cf1-c000.txt.lzo: getFileStatus on s3a://my_bucket/part-00000-a5adf948-1f65-4068-ae68-76c3708b4cf1-c000.txt.lzo: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: Q3TR40YDDCWFKESZ; S3 Extended Request ID: MeZc+rmfDXDL2DEXWS8zpfrn1s3dbthI31pErLpVT14kTGjpaxUYEmBG53D9f1cWwVRVL1oCzeM=), S3 Extended Request ID: MeZc+rmfDXDL2DEXWS8zpfrn1s3dbthI31pErLpVT14kTGjpaxUYEmBG53D9f1cWwVRVL1oCzeM=
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:174)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:117)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1923)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1877)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1812)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1632)
at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2631)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:575)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:559)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:411)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
The code I use for reading:
spark.read \
.option("io.compression.codecs", "com.hadoop.compression.lzo.LzoCodec") \
.schema(StructType().add("id", StringType())) \
.json("s3a://my_bucket/*.lzo")
Files in the location are inserted at different times using the same mechanism. Examining one of the "corrupted" files using AWS CLI, I see the error:
aws s3 cp s3://my_bucket/part-00000-499c7394-c5cf-4045-8a54-0e1d5bc87897-c000.txt.lzo - | head
download failed: s3://my_bucket/part-00000-499c7394-c5cf-4045-8a54-0e1d5bc87897-c000.txt.lzo to - An error occurred (403) when calling the HeadObject operation: Forbidden
My question is how to further trace down the failed reads of some of the files. Thank you!
UPD:
Somehow the problematic files are ununsually small compared to the rest: ~4KiB in size compared to 30-50MiB other items.
Related
I have been trying to set up an Upsert job in AWS Glue, which uses pyspark to create and update tables at the data lake catalog database (in Lakeformation). This job also applied LFTags to the resources (tables and columns).
The error I keep receiving is always about Insufficient Permissions.
E.g.:
022-01-18 21:11:58,381 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/driver.py", line 186, in <module>
run()
File "/tmp/driver.py", line 182, in run
main(sys.argv)
File "/tmp/driver.py", line 173, in main
cf.process_spark(spark)
File "/tmp/spark_pyutil-0.0.1-py3-none-any.zip/spark_pyutil/conf_process.py", line 772, in process_spark
self.__write_output_spark(spark, df_udf)
File "/tmp/spark_pyutil-0.0.1-py3-none-any.zip/spark_pyutil/conf_process.py", line 543, in __write_output_spark
r = lk.add_lf_tags_to_resource(Resource=table_resource, LFTags = lf_tags)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.AccessDeniedException: An error occurred (AccessDeniedException) when calling the AddLFTagsToResource operation: Insufficient Glue permissions to access table gene_genesymbol
I also tried unchecking the options
> Use only IAM access control for new databases Use only IAM access
> control for new tables in new databases
And add Associate and Describe permissions to the role in all LFTags in Lakeformation.
In another scenario, I tried adding different policies to the IAM role, checking theIAM access control options, and it also results in Insufficient permissions.
Does anybody see something off in what I'm doing?
I'm trying to use s3fs in python to connect to an s3 bucket. The associated credentials are saved in a profile called 'pete' in ~/.aws/credentials:
[default]
aws_access_key_id=****
aws_secret_access_key=****
[pete]
aws_access_key_id=****
aws_secret_access_key=****
This seems to work in AWS CLI (on Windows):
$>aws s3 ls s3://my-bucket/ --profile pete
PRE other-test-folder/
PRE test-folder/
But I get a permission denied error when I use what should be equivalent code using the s3fs package in python:
import s3fs
import requests
s3 = s3fs.core.S3FileSystem(profile = 'pete')
s3.ls('my-bucket')
I get this error:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 504, in _lsdir
async for i in it:
File "C:\ProgramData\Anaconda3\lib\site-packages\aiobotocore\paginate.py", line 32, in __anext__
response = await self._make_request(current_kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\aiobotocore\client.py", line 154, in _make_api_call
raise error_class(parsed_response, operation_name)
ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<ipython-input-9-4627a44a7ac3>", line 5, in <module>
s3.ls('ma-baseball')
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 993, in ls
files = maybe_sync(self._ls, self, path, refresh=refresh)
File "C:\ProgramData\Anaconda3\lib\site-packages\fsspec\asyn.py", line 97, in maybe_sync
return sync(loop, func, *args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\fsspec\asyn.py", line 68, in sync
raise exc.with_traceback(tb)
File "C:\ProgramData\Anaconda3\lib\site-packages\fsspec\asyn.py", line 52, in f
result[0] = await future
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 676, in _ls
return await self._lsdir(path, refresh)
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 527, in _lsdir
raise translate_boto_error(e) from e
PermissionError: Access Denied
I have to assume it's not a config issue within s3 because I can access s3 through the CLI. So something must be off with my s3fs code, but I can't find a whole lot of documentation on profiles in s3fs to figure out what's going on. Any help is of course appreciated.
I have a simple pipeline, which 3 weeks ago would work fine but I've gone back to the code to enhance it and when I tried to run the code it returned the following error:
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 10.280192286716229 seconds before retrying exists because we caught exception: TypeError: string indices must be integers
Traceback for above exception (most recent call last):
I am running the dataflow script via Cloud Shell on the Google Cloud Platform. By simply executing Python3 <dataflow.py>
The code is as follows, and this used to submit the job to dataflow without issue
import json
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions, StandardOptions
from apache_beam import coders
pipeline_args=[
'--runner=DataflowRunner',
'--job_name=my-job-name',
'--project=my-project-id',
'--region=europe-west2',
'--temp_location=gs://mybucket/temp',
'--staging_location=gs://mybucket/staging'
]
options = PipelineOptions (pipeline_args)
p = beam.Pipeline(options=options)
rows = (
p | 'Read daily Spot File' >> beam.io.ReadFromText(
file_pattern='gs://bucket/filename.gz',
compression_type='gzip',
coder=coders.BytesCoder(),
skip_header_lines=0))
p.run()
Any advice as to why this has started happening would be great to know.
Thanks in advance.
Not sure about the original issue but I can speak to Usman's post which seems to describe an issue I ran into myself.
Python doesn't use gcloud auth to authenticate but it uses the environment variable GOOGLE_APPLICATION_CREDENTIALS. So before you run the python command to launch the Dataflow job, you will need to set that environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key"
More info on setting up the environment variable: https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable
Then you'll have to make sure that the account you set up has the necessary permissions in your GCP project.
Permissions and service accounts:
User service account or user account: it needs the Dataflow Admin
role at the project level and to be able to act as the worker service
account (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#worker_service_account).
Worker service account: it will be one worker service account per
Dataflow pipeline. This account will need the Dataflow Worker role at
the project level plus the necessary permissions to the resources
accessed by the Dataflow pipeline (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#worker_service_account).
Example: if Dataflow pipeline’s input is Pub/Sub topic and output is
BigQuery table, the worker service account will need read access to
the topic as well as write permission to the BQ table.
Dataflow service account: this is the account that gets automatically
created when you enable the Dataflow API in a project. It
automatically gets the Dataflow Service Agent role at the project
level (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#service_account).
I found same kind of error following this [tutorial][1]
python3 PubSubToGCS.py \
--project=mango-s-11-a81277cf \
--region=us-east1 \
--input_topic=projects/mango-s-11-a81277cf/topics/pubsubtest \
--output_path=gs://mangobucket001/samples/output \
--runner=DataflowRunner \
--window_size=2 \
--num_shards=2 \
--temp_location=gs://mangobucket001/temp
a81277cf/topics/pubsubtest --output_path=gs://mangobucket001/ --runner=DataflowRunner --window_size=2 --num_shards=2 --temp_location=gs://mangobucket001/temp
INFO:apache_beam.runners.portability.stager:Downloading source distribution of the SDK from PyPi
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/bin/python3', '-m', 'pip', 'download', '--dest', '/tmp/tmpaelpxw64', 'apache-beam==2.33.0', '--no-deps', '--no-binary', ':all:']
INFO:apache_beam.runners.portability.stager:Staging SDK sources from PyPI: dataflow_python_sdk.tar
INFO:apache_beam.runners.portability.stager:Downloading binary distribution of the SDK from PyPi
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/bin/python3', '-m', 'pip', 'download', '--dest', '/tmp/tmpaelpxw64', 'apache-beam==2.33.0', '--no-deps', '--only-binary', ':all:', '--python-version', '38', '--implementation', 'cp', '--abi', 'cp38', '--platform', 'manylinux1_x86_64']
INFO:apache_beam.runners.portability.stager:Staging binary distribution of the SDK from PyPI: apache_beam-2.33.0-cp38-cp38-manylinux1_x86_64.whl
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.8 interpreter.
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.33.0
INFO:root:Using provided Python SDK container image: gcr.io/cloud-dataflow/v1beta3/python38-fnapi:2.33.0
INFO:root:Python SDK container image set to "gcr.io/cloud-dataflow/v1beta3/python38-fnapi:2.33.0" for Docker environment
INFO:apache_beam.runners.dataflow.internal.apiclient:Defaulting to the temp_location as staging_location: gs://mangobucket001/temp
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 3.054787201157167 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 5.22755214575824 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 19.20582351652022 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 39.51226214946517 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 61.15550203875504 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 122.46460946846845 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
[1]: https://cloud.google.com/pubsub/docs/pubsub-dataflow
for bucket in boto3.resource('s3').buckets.all():
print(bucket.name)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Users\icode\PycharmProjects\AWS\venv\lib\site-packages\boto3\resources\collection.py", line 83, in __iter__
for page in self.pages():
File "C:\Users\icode\PycharmProjects\AWS\venv\lib\site-packages\boto3\resources\collection.py", line 161, in pages
pages = [getattr(client, self._py_operation_name)(**params)]
File "C:\Users\icode\PycharmProjects\AWS\venv\lib\site-packages\botocore\client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "C:\Users\icode\PycharmProjects\AWS\venv\lib\site-packages\botocore\client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (NotSignedUp) when calling the ListBuckets operation: Your account is not signed up for the S3 service. You must sign up before you can use S3.```
You've not signed up for S3. You'll need to visit the AWS Console first, sign up, and wait for the activation email. See the documentation for more details.
take a look at your error message, on the last line it says your are not signed up
botocore.exceptions.ClientError: An error occurred (NotSignedUp) when calling the ListBuckets operation:
Your account is not signed up for the S3 service. You must sign up before you can use S3.```
what you need to do is:
from s3 documentation
I ssh to my EC2 instance. I can run these commands and they work perfectly:
aws sqs list-queues
aws s3 ls
I have a small Python script that pulls data from a database, formats it as XML, and then uploads the file to S3. This upload fails with this error:
Traceback (most recent call last):
File "./data_test/data_analytics/lexisnexis/async2.py", line 289, in <module>
insert_parallel(engine, qy, Create_Temp.profile_id, nworkers)
File "./data_test/data_analytics/lexisnexis/async2.py", line 241, in insert_parallel
s3upload(bucketname, keyname, f)
File "./data_test/data_analytics/lexisnexis/async2.py", line 89, in s3upload
bucket = conn.get_bucket(bucketname)
File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 506, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 525, in head_bucket
response = self.make_request('HEAD', bucket_name, headers=headers)
File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 668, in make_request
retry_handler=retry_handler
File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 1030, in _mexe
raise ex
SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)
How can I have a script that dies, even when aws cli works?
To be clear, I'm running the Python script as the same user, from the same EC2 instance, as I run the aws cli commands.
aws --version
aws-cli/1.11.176 Python/2.7.12 Linux/4.9.43-17.38.amzn1.x86_64 botocore/1.7.34
The last line of your error messages tells you the problem:
SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)
Your issue could be one of the following:
1) There is an error with the certificate with the server that you are connecting to.
2) The certificate chain is incomplete for the server that you are connecting to.
3) You are missing "cacert.pem". Do a Google search on "cacert.pem". This is a common problem and there is a lot of information on downloading and installing this file.
Certificate verification in Python