Glue can't read S3 bucket - amazon-web-services

Glue can't read S3 bucket - amazon-web-services

Description
Synced the data from other account by rclone, enabled the 'acl=bucket-owner-full-control'.
rclone sync 607562784642://cdh-bba-itdata-sub-cmdb-src-lt7g 162611943124://bbatest
When I cataloged the bucket data into Glue catalog by Crawler. Glue Crawler raised the following error
[49b1d1bd-d3f0-4801-9668-04f8651b06f4] ERROR : Not all read errors will be logged. com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: CD0062EA0B2D0AA7; S3 Extended Request ID: k0oHoKviPcWAs8yrn+9daImiTZ0Fx6sssbGiPF/7YwTjxUwITSDQHd2uTgh3K6QAcxDkvzHREJA=), S3 Extended Request ID: k0oHoKviPcWAs8yrn+9daImiTZ0Fx6sssbGiPF/7YwTjxUwITSDQHd2uTgh3K6QAcxDkvzHREJA=
Official Check list
I have checked items as per Official Check list.
bucket owner ID
object owner ID
Both of them were same. There wasn't additional bucket policy.
vpc endpoints
bucket policy
IAM policy
All policy didn't block glue to access S3 bucket.
The Crawler cataloged other bucket data successfully. So the glue configuration was correct.

The bucket enabled customer managed key.
But I forgot to add glue role to kms.

Related

AWS Glue Job - From S3 bucket to Redshift throws No Such Bucket

I'm trying to run a glue job from a data catalog that I created previously to Redshift. And It's throwing this error:
An error occurred while calling o151.pyWriteDynamicFrame. com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket;
Notes:
I have PowerUser access Role, so I have permission
The bucket exists
I have a connection between glue and Redshift
It's in the same region

How to write log and data in Druid Deep Storage in AWS S3

We have a druid cluster setup and now i am trying to write the indexing-logs and data into S3 deep storage.
Following are the details
druid.storage.type=s3
druid.storage.bucket=bucket-name
druid.storage.baseKey=druid/segments
# For S3:
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=your-bucket
druid.indexer.logs.s3Prefix=druid/indexing-logs
After running ingestion task i am getting below error
*Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: HCAFAZBA85QW14Q0; S3 Extended Request ID: 2ICzpVAyFcy/PLrnsUWZBJwEo7dFl/S2lwDTMn+v83uTp71jlEe59Q4/vFhwJU5/WGMYramdSIs=; Proxy: null*)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753) ~[aws-java-sdk-core-1.12.37.jar:?]
I tried to add the IAM role instance to the bucket level and same Role is running EC2 where Druid services are running.
Cam someone please guide what are the steps i am missing here.

I got it done!
I have created a new IAM role and created a policy where i have given permission to S3 bucket and subfolder
NOTE: Permission to S3 bucket is must
Example: If bucket name is "Bucket_1" and subfolder where Deep storage is configured is "deep_storage"
then make sure we should give permisson like:
**"arn:aws:s3:::Bucket_1"
"arn:aws:s3:::Bucket_1/*"**
I was missing with not giving to Bucket level permission and directly trying to give permission to sub folder level.
Also remove or comment out the below parameter from common.runtime.properties file from each servers of your Druid cluster
**druid.s3.accessKey=
druid.s3.secretKey=**
After this config I can see the data is getting successfully to S3 deep storage with IAM role and not with Secret & Access Key.

Not able to connect AWS QuickSight to AWS Athena: "TooManyBuckets"

I am trying to create a new dataset in AWS QuickSight and connecting it to AWS Athena. But the validation is failing with the following error.
[Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. You have attempted to create more buckets than allowed (Service: Amazon S3; Status Code: 400; Error Code: TooManyBuckets;
Does QuickSight creates a new bucket in S3 for creating new dataset?
If yes then my bucket capacity is full (100 buckets are already there in S3).
Is there any workaround for this?

Failed to edit persistent store "s3" - with spinnaker storage with s3 configuration

I am trying to configure the spinnaker with hal from the source (https://www.spinnaker.io/guides/tutorials/codelabs/hello-deployment/).
While configuring the storage with s3 in aws I am facing the below error.
Someone please guide me to resolve this issue.
hal config storage s3 edit --access-key-id xxxx --secret-access-key --region us-west-2
Problems in default.persistentStorage.s3:
! ERROR Failed to ensure the required bucket
"spin-1889a6d7-dd17-4896-9ef9-e07cc2ab5b2a" exists: Forbidden
(Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden;
Request ID: xxxx; S3 Extended Request ID: xxx
Failed to edit persistent store "s3".

You should have proper(read,write) permission to s3.
Check your permission on IAM console.
Make sure you have read,write access to s3.
More on IAM policies: https://aws.amazon.com/blogs/security/writing-iam-policies-how-to-grant-access-to-an-amazon-s3-bucket/

Trouble integrating EMR with S3

I am having trouble integrating EMR with S3 i.e to implement EMRFS
EMR Version: emr-5.4.0
When I run hdfs dfs -ls s3://pathto/bucket/ I get following error
ls: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: XXXX),
S3 Extended Request ID: XXXXX**
Please guide what is that, what I am missing ?
I have done following steps
Created a KMS Key for EMR
Added EMR_EC2_DefaultRole as key users in newly creates KMS Key
Created a S3 Server Side Encryption Security Config policy for EMR
Created new Inline policy for role/EMR_EC2_DefaultRole and EMR_DefaultRole for S3 bucket access
Created a EMR cluster manually with new EMR Security policy and following configuration classification
"fs.s3.enableServerSideEncryption": "true",
"fs.s3.serverSideEncryption.kms.keyId":"KEYID"

EMR, by default, will use instance profile credentials(EMR_EC2_DefaultRole) to access your S3 bucket. The error means this role does not have necessary permissions to access S3 bucket.
You will need to verify the IAM Role policy of that role to allow necessary S3 actions on both bucket and objects (Like s3:list*). Also check if you have any explicit Deny's etc.
http://docs.aws.amazon.com/AmazonS3/latest/dev/using-with-s3-actions.html
The access could also be denied because of a Bucket policy on set on the S3 bucket you are trying to access.
http://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html
https://aws.amazon.com/blogs/security/iam-policies-and-bucket-policies-and-acls-oh-my-controlling-access-to-s3-resources/
Your EMR cluster could be using an VPC endpoint for S3 to access S3 rather than Internet/NAT. In that case, you'll also need to verify VPC endpoint policies as well.
https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html#vpc-endpoints-policies-s3

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Glue can't read S3 bucket - amazon-web-services

The bucket enabled customer managed key. But I forgot to add glue role to kms.

Related

AWS Glue Job - From S3 bucket to Redshift throws No Such Bucket

How to write log and data in Druid Deep Storage in AWS S3

Not able to connect AWS QuickSight to AWS Athena: "TooManyBuckets"

Failed to edit persistent store "s3" - with spinnaker storage with s3 configuration

Trouble integrating EMR with S3

Categories

Resources