I'm trying to run a glue job from a data catalog that I created previously to Redshift. And It's throwing this error:
An error occurred while calling o151.pyWriteDynamicFrame. com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket;
Notes:
I have PowerUser access Role, so I have permission
The bucket exists
I have a connection between glue and Redshift
It's in the same region
I'm query data from glue catalog. For some of table I can see the data and some of table getting below error:
Error opening Hive split s3://test/sample/run-1-part-r-03 (offset=0, length=1156) using org.apache.hadoop.mapred.TextInputFormat: Permission denied on S3 path: s3://test/sample/run-1-part-r-03
I have give full access to Athena.
Amazon Athena adopts the permissions from the user when accessing Amazon S3.
If the user can access the objects in Amazon S3, then they can access them via Amazon Athena.
Does the user who ran the command have access to those objects?
I am trying to create a new dataset in AWS QuickSight and connecting it to AWS Athena. But the validation is failing with the following error.
[Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. You have attempted to create more buckets than allowed (Service: Amazon S3; Status Code: 400; Error Code: TooManyBuckets;
Does QuickSight creates a new bucket in S3 for creating new dataset?
If yes then my bucket capacity is full (100 buckets are already there in S3).
Is there any workaround for this?
Description
Synced the data from other account by rclone, enabled the 'acl=bucket-owner-full-control'.
rclone sync 607562784642://cdh-bba-itdata-sub-cmdb-src-lt7g 162611943124://bbatest
When I cataloged the bucket data into Glue catalog by Crawler. Glue Crawler raised the following error
[49b1d1bd-d3f0-4801-9668-04f8651b06f4] ERROR : Not all read errors will be logged. com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: CD0062EA0B2D0AA7; S3 Extended Request ID: k0oHoKviPcWAs8yrn+9daImiTZ0Fx6sssbGiPF/7YwTjxUwITSDQHd2uTgh3K6QAcxDkvzHREJA=), S3 Extended Request ID: k0oHoKviPcWAs8yrn+9daImiTZ0Fx6sssbGiPF/7YwTjxUwITSDQHd2uTgh3K6QAcxDkvzHREJA=
Official Check list
I have checked items as per Official Check list.
bucket owner ID
object owner ID
Both of them were same. There wasn't additional bucket policy.
vpc endpoints
bucket policy
IAM policy
All policy didn't block glue to access S3 bucket.
The Crawler cataloged other bucket data successfully. So the glue configuration was correct.
The bucket enabled customer managed key.
But I forgot to add glue role to kms.
I am getting below error while reading data from redshift table using spark.
Below is the code:
Dataset<Row> dfread = sql.read()
.format("com.databricks.spark.redshift")
.option("url", url)
//.option("query","select * from TESTSPARK")
.option("dbtable", "TESTSPARK")
.option("forward_spark_s3_credentials", true)
.option("tempdir","s3n://test/Redshift/temp/")
.option("sse", true)
.option("region", "us-east-1")
.load();
error:
Exception in thread "main" java.sql.SQLException: [Amazon](500310) Invalid operation: Unable to upload manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid=,CanRetry 1
Details:
error: Unable to upload manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 6FC2B3FD56DA0EAC,ExtRid I,CanRetry 1
code: 9012
context: s3://jd-us01-cis-machine-telematics-devl-data-
processed/Redshift/temp/f06bc4b2-494d-49b0-a100-2246818e22cf/manifest
query: 44179
Can any one please help?
You're getting a permission error from S3 when Redshift tries to access the files you're telling it to load.
Have you configured the access keys for S3 access before calling the load()?
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "ASDFGHJKLQWERTYUIOP")
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "QaZWSxEDC/rfgyuTGBYHY&UKEFGBTHNMYJ")
You should be able to check which access key id was used from the Redshift side by querying the stl_query table.
From the error "S3ServiceException:Access Denied"
It seems the permission is not set for Redshift to Access the S3 files. Please follow the below steps
Add a bucket policy to that bucket that allows the Redshift Account
access Create an IAM role in the Redshift Account that redshift can
assume Grant permissions to access the S3 Bucket to the newly created role
Associate the role with the Redshift cluster
Run COPY statements