I have an AWS Glue Spark job that fails with the following error:
An error occurred while calling o362.cache. com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ...; S3 Extended Request ID: ...; Proxy: null), S3 Extended Request ID: ...
I believe the error is thrown at line where the Spark persist() method is called on a DataFrame. The Glue job is assigned an IAM role that has full S3 access (all locations/operations allowed), yet I'm still getting the S3 exception. I tried setting the "Temporary path" for the Glue job on the AWS Console to a specific S3 bucket with full access, I also tried setting the Spark temporary directory to a specific S3 bucket with full access via:
conf = pyspark.SparkConf()
conf.set('spark.local.dir', 's3://...')
self.sc = SparkContext(conf=conf)
which didn't help. It's very strange that the job is failing even with full S3 access. Not sure what to try next, any help would be really appreciated. Thank you!
I am trying to create a new dataset in AWS QuickSight and connecting it to AWS Athena. But the validation is failing with the following error.
[Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. You have attempted to create more buckets than allowed (Service: Amazon S3; Status Code: 400; Error Code: TooManyBuckets;
Does QuickSight creates a new bucket in S3 for creating new dataset?
If yes then my bucket capacity is full (100 buckets are already there in S3).
Is there any workaround for this?
Description
Synced the data from other account by rclone, enabled the 'acl=bucket-owner-full-control'.
rclone sync 607562784642://cdh-bba-itdata-sub-cmdb-src-lt7g 162611943124://bbatest
When I cataloged the bucket data into Glue catalog by Crawler. Glue Crawler raised the following error
[49b1d1bd-d3f0-4801-9668-04f8651b06f4] ERROR : Not all read errors will be logged. com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: CD0062EA0B2D0AA7; S3 Extended Request ID: k0oHoKviPcWAs8yrn+9daImiTZ0Fx6sssbGiPF/7YwTjxUwITSDQHd2uTgh3K6QAcxDkvzHREJA=), S3 Extended Request ID: k0oHoKviPcWAs8yrn+9daImiTZ0Fx6sssbGiPF/7YwTjxUwITSDQHd2uTgh3K6QAcxDkvzHREJA=
Official Check list
I have checked items as per Official Check list.
bucket owner ID
object owner ID
Both of them were same. There wasn't additional bucket policy.
vpc endpoints
bucket policy
IAM policy
All policy didn't block glue to access S3 bucket.
The Crawler cataloged other bucket data successfully. So the glue configuration was correct.
The bucket enabled customer managed key.
But I forgot to add glue role to kms.
I'm trying to query data in the requester-pays enabled S3 bucket but I got the following 403 error. The Redshift IAM user has permission for the bucket. How can I read the data using Redshift Spectrum - give requester-pays parameter?
ERROR:
[XX000][500310] [Amazon](500310) Invalid operation: S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid...
I am getting below error while reading data from redshift table using spark.
Below is the code:
Dataset<Row> dfread = sql.read()
.format("com.databricks.spark.redshift")
.option("url", url)
//.option("query","select * from TESTSPARK")
.option("dbtable", "TESTSPARK")
.option("forward_spark_s3_credentials", true)
.option("tempdir","s3n://test/Redshift/temp/")
.option("sse", true)
.option("region", "us-east-1")
.load();
error:
Exception in thread "main" java.sql.SQLException: [Amazon](500310) Invalid operation: Unable to upload manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid=,CanRetry 1
Details:
error: Unable to upload manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 6FC2B3FD56DA0EAC,ExtRid I,CanRetry 1
code: 9012
context: s3://jd-us01-cis-machine-telematics-devl-data-
processed/Redshift/temp/f06bc4b2-494d-49b0-a100-2246818e22cf/manifest
query: 44179
Can any one please help?
You're getting a permission error from S3 when Redshift tries to access the files you're telling it to load.
Have you configured the access keys for S3 access before calling the load()?
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "ASDFGHJKLQWERTYUIOP")
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "QaZWSxEDC/rfgyuTGBYHY&UKEFGBTHNMYJ")
You should be able to check which access key id was used from the Redshift side by querying the stl_query table.
From the error "S3ServiceException:Access Denied"
It seems the permission is not set for Redshift to Access the S3 files. Please follow the below steps
Add a bucket policy to that bucket that allows the Redshift Account
access Create an IAM role in the Redshift Account that redshift can
assume Grant permissions to access the S3 Bucket to the newly created role
Associate the role with the Redshift cluster
Run COPY statements