I'm trying to run a glue job from a data catalog that I created previously to Redshift. And It's throwing this error:
An error occurred while calling o151.pyWriteDynamicFrame. com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket;
Notes:
I have PowerUser access Role, so I have permission
The bucket exists
I have a connection between glue and Redshift
It's in the same region
Description
Synced the data from other account by rclone, enabled the 'acl=bucket-owner-full-control'.
rclone sync 607562784642://cdh-bba-itdata-sub-cmdb-src-lt7g 162611943124://bbatest
When I cataloged the bucket data into Glue catalog by Crawler. Glue Crawler raised the following error
[49b1d1bd-d3f0-4801-9668-04f8651b06f4] ERROR : Not all read errors will be logged. com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: CD0062EA0B2D0AA7; S3 Extended Request ID: k0oHoKviPcWAs8yrn+9daImiTZ0Fx6sssbGiPF/7YwTjxUwITSDQHd2uTgh3K6QAcxDkvzHREJA=), S3 Extended Request ID: k0oHoKviPcWAs8yrn+9daImiTZ0Fx6sssbGiPF/7YwTjxUwITSDQHd2uTgh3K6QAcxDkvzHREJA=
Official Check list
I have checked items as per Official Check list.
bucket owner ID
object owner ID
Both of them were same. There wasn't additional bucket policy.
vpc endpoints
bucket policy
IAM policy
All policy didn't block glue to access S3 bucket.
The Crawler cataloged other bucket data successfully. So the glue configuration was correct.
The bucket enabled customer managed key.
But I forgot to add glue role to kms.
I'm trying to create Glue DynamicFrame from S3 object using following command.
df = glue_context.create_dynamic_frame.from_catalog(database="s3_bucket_name",
table_name="s3_object_name")
This gives an InvalidInputException with reason being Cannot vend credentials for buckets in multiple regions
This has worked earlier in a similar setup, but it is failing in the new account I tried. Any clue on what could be wrong here?
Full error
com.amazonaws.services.gluejobexecutor.model.InvalidInputException: Cannot vend credentials for buckets in multiple regions (Service: AWSLakeFormation; Status Code: 400; Error Code: InvalidInputException```
I am trying to use an AWS Glue crawler on an S3 bucket to populate a Glue database. I run the Create Crawler wizard, select my datasource (the S3 bucket with the avro files), have it create the IAM role, and run it, and I get the following error:
Database does not exist or principal is not authorized to create tables. (Database name: zzz-db, Table name: avroavro_all) (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: 78fc18e4-c383-11e9-a86f-736a16f57a42). For more information, see Setting up IAM Permissions in the Developer Guide (http://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html).
I tried to create this table in a new blank database (as opposed to an existing one with tables), I tried prefixing the names, I tried sourcing different schemas, and I tried using an existing role with Admin access. I though the latter would work, but I keep getting the same error, and have no idea why.
To be explicit, the service role I created has several policies I assume a premissive enough to create tables:
The logs are vanilla:
19:52:52
[10cb3191-9785-49dc-8935-fb02dcbd69a3] BENCHMARK : Running Start Crawl for Crawler avro
19:53:22
[10cb3191-9785-49dc-8935-fb02dcbd69a3] BENCHMARK : Classification complete, writing results to database zzz-db
19:53:22
[10cb3191-9785-49dc-8935-fb02dcbd69a3] INFO : Crawler configured with SchemaChangePolicy {"UpdateBehavior":"UPDATE_IN_DATABASE","DeleteBehavior":"DEPRECATE_IN_DATABASE"}.
19:53:34
[10cb3191-9785-49dc-8935-fb02dcbd69a3] ERROR : Insufficient Lake Formation permission(s) on s3://zzz-data/avro-all/ (Database name: zzz-db, Table name: avroavro_all) (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: 31481e7e-c384-11e9-a6e1-e78dc8223fae). For more information, see Setting up IAM Permissions in the Developer Guide (http://docs.aws.amazon.com/glu
19:54:44
[10cb3191-9785-49dc-8935-fb02dcbd69a3] BENCHMARK : Crawler has finished running and is in state READY
I had the same problem when I setup and ran a new AWS crawler after enabling Lake Formation (in the same AWS account). I've been running Glue crawler for a long time and was stumped when I saw this new error.
After some trial and error, I found that the root cause of the problem is when you enable Lake Formation, it adds an additional layer of permission on new Glue database(s) that are created via Glue Crawler and to any resource (Glue catalog, S3, etc) that you add it to the Lake Formation service.
To fix this problem, you have to grant the Crawler's IAM role, a proper set of Lake Formation permissions (CRUD) for the database.
You can manage these permissions in AWS Lake Formation console (UI) under the Permissions > Data permissions section or via awscli lake formation commands.
I solved this problem by adding a grant in AWS Lake Formations -> Permissions -> Data locations. (Do not forget to add a forward slash (/) behind the bucket name)
I had to add the custom role I created for Glue to the "Data lake Administrators" grantees:
(Note: just saying this solves the crawler's denied access. There may be something with lesser privileges to do...)
Make sure you gave the necessary permissions to your crawler's IAM role in this path:
Lake Formation -> Permissions -> Data lake permissions
(Grant related Glue Database permissions to your crawler's IAM role)
I am having trouble integrating EMR with S3 i.e to implement EMRFS
EMR Version: emr-5.4.0
When I run hdfs dfs -ls s3://pathto/bucket/ I get following error
ls: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: XXXX),
S3 Extended Request ID: XXXXX**
Please guide what is that, what I am missing ?
I have done following steps
Created a KMS Key for EMR
Added EMR_EC2_DefaultRole as key users in newly creates KMS Key
Created a S3 Server Side Encryption Security Config policy for EMR
Created new Inline policy for role/EMR_EC2_DefaultRole and EMR_DefaultRole for S3 bucket access
Created a EMR cluster manually with new EMR Security policy and following configuration classification
"fs.s3.enableServerSideEncryption": "true",
"fs.s3.serverSideEncryption.kms.keyId":"KEYID"
EMR, by default, will use instance profile credentials(EMR_EC2_DefaultRole) to access your S3 bucket. The error means this role does not have necessary permissions to access S3 bucket.
You will need to verify the IAM Role policy of that role to allow necessary S3 actions on both bucket and objects (Like s3:list*). Also check if you have any explicit Deny's etc.
http://docs.aws.amazon.com/AmazonS3/latest/dev/using-with-s3-actions.html
The access could also be denied because of a Bucket policy on set on the S3 bucket you are trying to access.
http://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html
https://aws.amazon.com/blogs/security/iam-policies-and-bucket-policies-and-acls-oh-my-controlling-access-to-s3-resources/
Your EMR cluster could be using an VPC endpoint for S3 to access S3 rather than Internet/NAT. In that case, you'll also need to verify VPC endpoint policies as well.
https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html#vpc-endpoints-policies-s3