Amazon Athena error opening Hive split s3 path and Access Denied - amazon-web-services

I'm query data from glue catalog. For some of table I can see the data and some of table getting below error:
Error opening Hive split s3://test/sample/run-1-part-r-03 (offset=0, length=1156) using org.apache.hadoop.mapred.TextInputFormat: Permission denied on S3 path: s3://test/sample/run-1-part-r-03
I have give full access to Athena.

Amazon Athena adopts the permissions from the user when accessing Amazon S3.
If the user can access the objects in Amazon S3, then they can access them via Amazon Athena.
Does the user who ran the command have access to those objects?

Related

The same query working on Athena and not working on Lambda - S3 permissions

I'm trying to query a partitioned table that is based on S3 bucket from Lambda
and get the following error:
But, when I used the same query via Athena it works well.
My Lambda role includes S3 full permission for all the resources.
BTW I received access to other S3 bucket (another account), this is not my bucket but I've read, and list permissions. and using Lambda I'm able to create the partition table on their bucket.
Using Lambda, this query is working
ALTER TABLE access_Partition ADD PARTITION
(year = '2022', month = '03',day= '15' ,hour = '01') LOCATION 's3://sddds/2022/03/15/01/';
But select query on the above table (after the creation) get a permission error
(When I open the executed query on Athena it's marked as failed but I can run it successfully )
select * from access_Partition
Please advise!!!
Amazon Athena uses the permissions of the entity making the call to access Amazon S3. So, when you run an Athena query in the console, it is using permissions from your IAM User. When it is run from Lambda, it uses the permissions from the IAM Role associated with the Lambda function.
When this command is run:
ALTER TABLE access_Partition ADD PARTITION
(year = '2022', month = '03',day= '15' ,hour = '01') LOCATION 's3://sddds/2022/03/15/01/';
it is updating information (metadata) in the data catalog used in Athena in your own account. It is not actually accessing the bucket until a query is run.
The fact that the query fails when it is run suggests that the IAM Role does not have permission to access the bucket in the other AWS Account.
You should add a Bucket Policy on the S3 bucket in the other account that grants access permission for the IAM Role used by the Lambda function.

Is there a way to crawl a S3 bucket with AWS Glue that requires Requester Pays?

I need to create a crawler on AWS Glue to catalogue some tables that I usually query on CLI by using
something like this:
$ aws s3 ls s3://bucket/path/ --request-payer requester
but when creating a crawler I can't figure out where I need to configure the requester pays option, so I'm getting this error log:
ERROR : User does not have access to target
Any thoughts?
I'm using the AWS console for that.

AWS Glue cannot create database from crawler: permission denied

I am trying to use an AWS Glue crawler on an S3 bucket to populate a Glue database. I run the Create Crawler wizard, select my datasource (the S3 bucket with the avro files), have it create the IAM role, and run it, and I get the following error:
Database does not exist or principal is not authorized to create tables. (Database name: zzz-db, Table name: avroavro_all) (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: 78fc18e4-c383-11e9-a86f-736a16f57a42). For more information, see Setting up IAM Permissions in the Developer Guide (http://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html).
I tried to create this table in a new blank database (as opposed to an existing one with tables), I tried prefixing the names, I tried sourcing different schemas, and I tried using an existing role with Admin access. I though the latter would work, but I keep getting the same error, and have no idea why.
To be explicit, the service role I created has several policies I assume a premissive enough to create tables:
The logs are vanilla:

19:52:52
[10cb3191-9785-49dc-8935-fb02dcbd69a3] BENCHMARK : Running Start Crawl for Crawler avro
19:53:22
[10cb3191-9785-49dc-8935-fb02dcbd69a3] BENCHMARK : Classification complete, writing results to database zzz-db
19:53:22
[10cb3191-9785-49dc-8935-fb02dcbd69a3] INFO : Crawler configured with SchemaChangePolicy {"UpdateBehavior":"UPDATE_IN_DATABASE","DeleteBehavior":"DEPRECATE_IN_DATABASE"}.
19:53:34
[10cb3191-9785-49dc-8935-fb02dcbd69a3] ERROR : Insufficient Lake Formation permission(s) on s3://zzz-data/avro-all/ (Database name: zzz-db, Table name: avroavro_all) (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: 31481e7e-c384-11e9-a6e1-e78dc8223fae). For more information, see Setting up IAM Permissions in the Developer Guide (http://docs.aws.amazon.com/glu
19:54:44
[10cb3191-9785-49dc-8935-fb02dcbd69a3] BENCHMARK : Crawler has finished running and is in state READY
I had the same problem when I setup and ran a new AWS crawler after enabling Lake Formation (in the same AWS account). I've been running Glue crawler for a long time and was stumped when I saw this new error.
After some trial and error, I found that the root cause of the problem is when you enable Lake Formation, it adds an additional layer of permission on new Glue database(s) that are created via Glue Crawler and to any resource (Glue catalog, S3, etc) that you add it to the Lake Formation service.
To fix this problem, you have to grant the Crawler's IAM role, a proper set of Lake Formation permissions (CRUD) for the database.
You can manage these permissions in AWS Lake Formation console (UI) under the Permissions > Data permissions section or via awscli lake formation commands.
I solved this problem by adding a grant in AWS Lake Formations -> Permissions -> Data locations. (Do not forget to add a forward slash (/) behind the bucket name)
I had to add the custom role I created for Glue to the "Data lake Administrators" grantees:
(Note: just saying this solves the crawler's denied access. There may be something with lesser privileges to do...)
Make sure you gave the necessary permissions to your crawler's IAM role in this path:
Lake Formation -> Permissions -> Data lake permissions
(Grant related Glue Database permissions to your crawler's IAM role)

Access denied when accessing Athena in SQLalchemy

Using pyathena and SQLalchemy, I connect to AWS Athena.
If I use keys of AWS admin, all is working fine, can query data.
If I use keys of an aws user that have AmazonAthenaFullAccess and AWSQuicksightAthenaAccess permissions, I get access deny.
I have permission to the output S3, and Athena access a public data set S3 bucket.
What permissions am I missing?
Thanks
AmazonAthenaFullAccess policy provides access to S3 buckets such as: "arn:aws:s3:::aws-athena-query-results-" and "arn:aws:s3:::athena-examples".
You have 2 options:
Create a new policy and add content from AmazonAthenaFullAccess policy, but with different S3 resources.
Add AmazonS3FullAccess policy to your user, which grants permissions for all your S3 buckets

Spark Redshift: error while reading redshift tables using spark

I am getting below error while reading data from redshift table using spark.
Below is the code:
Dataset<Row> dfread = sql.read()
.format("com.databricks.spark.redshift")
.option("url", url)
//.option("query","select * from TESTSPARK")
.option("dbtable", "TESTSPARK")
.option("forward_spark_s3_credentials", true)
.option("tempdir","s3n://test/Redshift/temp/")
.option("sse", true)
.option("region", "us-east-1")
.load();
error:
Exception in thread "main" java.sql.SQLException: [Amazon](500310) Invalid operation: Unable to upload manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid=,CanRetry 1
Details:
error: Unable to upload manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 6FC2B3FD56DA0EAC,ExtRid I,CanRetry 1
code: 9012
context: s3://jd-us01-cis-machine-telematics-devl-data-
processed/Redshift/temp/f06bc4b2-494d-49b0-a100-2246818e22cf/manifest
query: 44179
Can any one please help?
You're getting a permission error from S3 when Redshift tries to access the files you're telling it to load.
Have you configured the access keys for S3 access before calling the load()?
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "ASDFGHJKLQWERTYUIOP")
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "QaZWSxEDC/rfgyuTGBYHY&UKEFGBTHNMYJ")
You should be able to check which access key id was used from the Redshift side by querying the stl_query table.
From the error "S3ServiceException:Access Denied"
It seems the permission is not set for Redshift to Access the S3 files. Please follow the below steps
Add a bucket policy to that bucket that allows the Redshift Account
access Create an IAM role in the Redshift Account that redshift can
assume Grant permissions to access the S3 Bucket to the newly created role
Associate the role with the Redshift cluster
Run COPY statements