AWS Glue cannot create database from crawler: permission denied - amazon-web-services

I am trying to use an AWS Glue crawler on an S3 bucket to populate a Glue database. I run the Create Crawler wizard, select my datasource (the S3 bucket with the avro files), have it create the IAM role, and run it, and I get the following error:
Database does not exist or principal is not authorized to create tables. (Database name: zzz-db, Table name: avroavro_all) (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: 78fc18e4-c383-11e9-a86f-736a16f57a42). For more information, see Setting up IAM Permissions in the Developer Guide (http://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html).
I tried to create this table in a new blank database (as opposed to an existing one with tables), I tried prefixing the names, I tried sourcing different schemas, and I tried using an existing role with Admin access. I though the latter would work, but I keep getting the same error, and have no idea why.
To be explicit, the service role I created has several policies I assume a premissive enough to create tables:
The logs are vanilla:

19:52:52
[10cb3191-9785-49dc-8935-fb02dcbd69a3] BENCHMARK : Running Start Crawl for Crawler avro
19:53:22
[10cb3191-9785-49dc-8935-fb02dcbd69a3] BENCHMARK : Classification complete, writing results to database zzz-db
19:53:22
[10cb3191-9785-49dc-8935-fb02dcbd69a3] INFO : Crawler configured with SchemaChangePolicy {"UpdateBehavior":"UPDATE_IN_DATABASE","DeleteBehavior":"DEPRECATE_IN_DATABASE"}.
19:53:34
[10cb3191-9785-49dc-8935-fb02dcbd69a3] ERROR : Insufficient Lake Formation permission(s) on s3://zzz-data/avro-all/ (Database name: zzz-db, Table name: avroavro_all) (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: 31481e7e-c384-11e9-a6e1-e78dc8223fae). For more information, see Setting up IAM Permissions in the Developer Guide (http://docs.aws.amazon.com/glu
19:54:44
[10cb3191-9785-49dc-8935-fb02dcbd69a3] BENCHMARK : Crawler has finished running and is in state READY

I had the same problem when I setup and ran a new AWS crawler after enabling Lake Formation (in the same AWS account). I've been running Glue crawler for a long time and was stumped when I saw this new error.
After some trial and error, I found that the root cause of the problem is when you enable Lake Formation, it adds an additional layer of permission on new Glue database(s) that are created via Glue Crawler and to any resource (Glue catalog, S3, etc) that you add it to the Lake Formation service.
To fix this problem, you have to grant the Crawler's IAM role, a proper set of Lake Formation permissions (CRUD) for the database.
You can manage these permissions in AWS Lake Formation console (UI) under the Permissions > Data permissions section or via awscli lake formation commands.

I solved this problem by adding a grant in AWS Lake Formations -> Permissions -> Data locations. (Do not forget to add a forward slash (/) behind the bucket name)

I had to add the custom role I created for Glue to the "Data lake Administrators" grantees:
(Note: just saying this solves the crawler's denied access. There may be something with lesser privileges to do...)

Make sure you gave the necessary permissions to your crawler's IAM role in this path:
Lake Formation -> Permissions -> Data lake permissions
(Grant related Glue Database permissions to your crawler's IAM role)

Related

Unable to connect to S3 while creating Elasticsearch snapshot repository

I am trying to register a respository on AWS S3 to store ElasticSearch snapshots.
I am following guide and ran the very first command listed in the doc.
But I am getting the error Access Denied while executing that command.
The role that is being used to perform operations on S3 is the AmazonEKSNodeRole.
I have assigned the appropriate permissions to the role to perform operations on the S3 bucket.
Also, here is another doc which suggests to use kibana for ElasticSearch version > 7.2 but I am doing the same via cURL requests.
Below is trust Policy of the role through which I am making the request to register repository in the S3 bucket.
Also, below are the screenshots of the permissions of the trusting and trusted accounts respectively -

How to write log and data in Druid Deep Storage in AWS S3

We have a druid cluster setup and now i am trying to write the indexing-logs and data into S3 deep storage.
Following are the details
druid.storage.type=s3
druid.storage.bucket=bucket-name
druid.storage.baseKey=druid/segments
# For S3:
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=your-bucket
druid.indexer.logs.s3Prefix=druid/indexing-logs
After running ingestion task i am getting below error
*Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: HCAFAZBA85QW14Q0; S3 Extended Request ID: 2ICzpVAyFcy/PLrnsUWZBJwEo7dFl/S2lwDTMn+v83uTp71jlEe59Q4/vFhwJU5/WGMYramdSIs=; Proxy: null*)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779) ~[aws-java-sdk-core-1.12.37.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753) ~[aws-java-sdk-core-1.12.37.jar:?]
I tried to add the IAM role instance to the bucket level and same Role is running EC2 where Druid services are running.
Cam someone please guide what are the steps i am missing here.
I got it done!
I have created a new IAM role and created a policy where i have given permission to S3 bucket and subfolder
NOTE: Permission to S3 bucket is must
Example: If bucket name is "Bucket_1" and subfolder where Deep storage is configured is "deep_storage"
then make sure we should give permisson like:
**"arn:aws:s3:::Bucket_1"
"arn:aws:s3:::Bucket_1/*"**
I was missing with not giving to Bucket level permission and directly trying to give permission to sub folder level.
Also remove or comment out the below parameter from common.runtime.properties file from each servers of your Druid cluster
**druid.s3.accessKey=
druid.s3.secretKey=**
After this config I can see the data is getting successfully to S3 deep storage with IAM role and not with Secret & Access Key.

Permission bigquery.tables.get denied or it may not exist

I am using the AWS Glue connector for BigQuery. My glue jobs were running fine in multiple AWS accounts but suddenly it started failing with the below response in all the accounts together:
Access Denied: Table common-infra-services:detailedcost.gcp_billing_export_resource_v1_01E8AD_3E792E_BB0E5D: Permission bigquery.tables.get denied on table common-infra-services:detailedcost.gcp_billing_export_resource_v1_01E8AD_3E792E_BB0E5D (or it may not exist).", "reason": "accessDenied"
Please review and let me know what could be the issue of this problem.
I am using the GCP IAM service account role to run queries using Glue to BigQuery with the following set of permissions:
bigquery.jobs.create
bigquery.tables.getData
bigquery.tables.list
And with these permissions, all jobs were running fine till yesterday.
Based on that error message I'd check if table common-infra-services:detailedcost.gcp_billing_export_resource_v1_01E8AD_3E792E_BB0E5D exists. If it does you might need to add permission bigquery.tables.get to your service account.

Error in AWS SageMaker Ground Truth labeled job creation

I'm using AWS SageMaker Ground Truth for labeling images. I have uploaded the data into s3 bucket, create the IAM role to access 'S3,SageMaker,Groundtruth, and IAM'. When I am trying to create labeling job, it give me this error:
NetworkingError: Network Failure - The S3 bucket 'sm-gt-s3-enron' you entered in Input dataset location cannot be reached. Either the bucket does not exist, or you do not have permission to access it. If the bucket does not exist, update Input dataset location with a new S3 URI. If the bucket exists, give the IAM entity you are using to create this labeling job permission to read and write to this S3 bucket, and try your request again.
From the error, it looks like either you:
have not created the bucket in the same region where you are running the labeling job.
OR
have not provided correct IAM permissions for the execution role attached to this labeling job.
The role info you share in the question, is it your logged in IAM role info or the execution role info attached to the labeling job?
Can you try accessing the S3 bucket from your local CLI, or from an EC2 instance in the same region?

error 403 while creating emr cluster using my reducer and mapper?

I am trying to use my bucket to give the arguments for the EMR to create a cluster for it is giving me "All access to this object has been disabled (Service: Amazon S3; Status Code: 403; Error Code: AllAccessDisabled;"
I have used my Reducer and Mapper python files and my bucket's permission is public too
is there something wrong with my mapper and reducer files or am I missing a trick here
Make sure you've assigned your EMR cluster an IAM role that has adequate S3 access permissions. IAM enables you to grant permissions to users, groups, or resources (like your EMR cluster, in this instance) to be able to access other services or resources in AWS (like S3, which is currently giving you an access denied error).
To do this through EMRFS:
Navigate to the EMR console
click Security configurations (on left menu)
Scroll down to IAM roles for EMRFS
Enable Use IAM roles for EMRFS requests to Amazon S3
Add role mapping
Select desired IAM role (Admin)
Select whatever basis for access you prefer (User, group, or S3 bucket name prefix)
Here's a pic of what it looks like in console:
More on this available in the docs here: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html