I'm trying to connect my local Presto to AWS Glue for metadata and S3 for data. I'm able to connect to Glue and do show tables; and desc <table>;. However, it's giving me this error when I do select * from <table>;
Query <query id> failed: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: <id>; S3 Extended Request ID: <id>)
My hive.properties file looks like this
connector.name=hive-hadoop2
hive.metastore=glue
hive.metastore.glue.region=<region>
hive.s3.use-instance-credentials=false
hive.s3.aws-access-key=<access key>
hive.s3.aws-secret-key=<secret key>
The error says the credentials are not recognized as valid. Since you can connect to Glue, it seems your environment or ~/.aws has some valid credentials. You should be able to utilize those credentials for S3 access as well.
For this, make sure you are using Presto 332 or better and remove hive.s3.use-instance-credentials, hive.s3.aws-access-key, hive.s3.aws-secret-key from your settings.
Related
I am trying to add external storage to my Nextcloud to use. That would be an AWS S3 bucket. However, this is not possible because I get the following error message:
Exception: Creation of bucket \"nextcloud-modul346\" failed. Error executing \"CreateBucket\" on \"http:\/\/nextcloud-modul346.s3.eu-west-1.amazonaws.com\/\"; AWS HTTP error: Client error: `PUT http:\/\/nextcloud-modul346.s3.eu-west-1.amazonaws.com\/` resulted in a `403 Forbidden` response:\n\u003C?xml version=\"1.0\" encoding=\"UTF-8\"?\u003E\n\u003CError\u003E\u003CCode\u003EInvalidAccessKeyId\u003C\/Code\u003E\u003CMessage\u003EThe AWS Access Key Id you provided (truncated...)\n InvalidAccessKeyId (client): The AWS Access Key Id you provided does not exist in our records. - \u003C?xml version=\"1.0\" encoding=\"UTF-8\"?\u003E\n\u003CError\u003E\u003CCode\u003EInvalidAccessKeyId\u003C\/Code\u003E\u003CMessage\u003EThe AWS Access Key Id you provided does not exist in our records.\u003C\/Message\u003E\u003CAWSAccessKeyId\u003EASIARERFVIEWRBG5WD63\u003C\/AWSAccessKeyId\u003E\u003CRequestId\u003EM6BN3MC6F0214DQM\u003C\/RequestId\u003E\u003CHostId\u003EgVf0nUVJXQDL2VV50pP0qSzbTi+N+8OMbgvj4nUMv10pg\/T5VVccb4IstfopzzhuxuUCtY+1E58=\u003C\/HostId\u003E\u003C\/Error\u003E
However, I cannot use IAM users or groups as this is blocked by my organization. Also, I work with the AWS Learner Lab and I have to use S3.
As credentials I have specified in Nextcloud the aws_access_key_id and aws_secret_access_key from Learnerlab. However, I cannot connect to it. This Post havn't helped either.
Does anyone know a solution to this problem which does not involve IAM?
Thanks for any help!
I have an AWS Glue Spark job that fails with the following error:
An error occurred while calling o362.cache. com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ...; S3 Extended Request ID: ...; Proxy: null), S3 Extended Request ID: ...
I believe the error is thrown at line where the Spark persist() method is called on a DataFrame. The Glue job is assigned an IAM role that has full S3 access (all locations/operations allowed), yet I'm still getting the S3 exception. I tried setting the "Temporary path" for the Glue job on the AWS Console to a specific S3 bucket with full access, I also tried setting the Spark temporary directory to a specific S3 bucket with full access via:
conf = pyspark.SparkConf()
conf.set('spark.local.dir', 's3://...')
self.sc = SparkContext(conf=conf)
which didn't help. It's very strange that the job is failing even with full S3 access. Not sure what to try next, any help would be really appreciated. Thank you!
Background
I am attempting to upload a file to an AWS S3 bucket in Jenkins. I am using the steps/closures provided by the AWS Steps plugin. I am using an Access Key ID and an Access Key Secret and storing it as a username and password, respectively, in Credential Manager.
Code
Below is the code I am using in a declarative pipeline script
sh('echo "test" > someFile')
withAWS(credentials:'AwsS3', region:'us-east-1') {
s3Upload(file:'someFile', bucket:'ec-sis-integration-test', acl:'BucketOwnerFullControl')
}
sh('rm -f someFile')
Here is a screenshot of the credentials as they are stored globally in Credential Manager.
Issue
Whenever I run the pipeline I get the following error
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 5N9VEJBY5MDZ2K0W; S3 Extended Request ID: AJmuP635cME8m035nA6rQVltCCJqHDPXsjVk+sLziTyuAiSN23Q1j5RtoQwfHCDXAOexPVVecA4=; Proxy: null), S3 Extended Request ID: AJmuP635cME8m035nA6rQVltCCJqHDPXsjVk+sLziTyuAiSN23Q1j5RtoQwfHCDXAOexPVVecA4=
Does anyone know why this isn't working?
Trouble Shooting
I have verified the Access Key ID and Access Key Secret combination works by testing it out through a small Java application I wrote. Additionally I set the id/secret via Java system properties ( through the script console ), but still get the same error.
System.setProperty("aws.accessKeyId", "<KEY_ID>")
System.setProperty("aws.secretKey", "<KEY_SECRET>")
I also tried to change the credential manager type from username/password to aws credentials as seen below. It made no difference
it might be a bucket and object ownership issue. check if the credentials you use allow you to upload to the bucket ec-sis-integration-test.
I am getting an access denied error when I try to run athena query from root account. what am I doing wrong?
I have tried to create IAM user roles, but not sure if I am doing right. I just wanted to do a quick test.
Create s3 bucket -> upload csv -> go to athena -> pull data from s3 -> run query
Error that I am getting is:
Your query has the following error(s):
Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: BF8CDA860116C79B; S3 Extended Request ID: 825bTOZNiWP1bUJUGV3Bg5NSzy3ywqZdoBtwYItrxQfr8kqDpGP1RBIHR6NFIBySgO/qIKA8/Cw=)
This query ran against the "sampledb" database, unless qualified by the query. Please post the error message on our forum or contact customer support with
Query Id: c08c11e6-e049-46f1-a671-0746da7e7c84.
What am I doing wrong. I just wanted to do a quick test
If executing query from AWS Athena Web console, Ensure you have access to S3 bucket location of table.you can extract location from SHOW CREATE TABLE command.
I'm running Spark 2.4 on an EC2 instance. I am assuming an IAM role and setting the key/secret key/token in the sparkSession.sparkContext.hadoopConfiguration, along with the credentials provider as "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider".
When I try to read a dataset from s3 (using s3a, which is also set in the hadoop config), I get an error that says
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 7376FE009AD36330, AWS Error Code: null, AWS Error Message: Forbidden
read command:
val myData = sparkSession.read.parquet("s3a://myBucket/myKey")
I've repeatedly checked the S3 path and it's correct. My assumed IAM role has the right privileges on the S3 bucket. The only thing I can figure at this point is that spark has some sort of hidden credential chain ordering and even though I have set the credentials in the hadoop config, it is still grabbing credentials from somewhere else (my instance profile???). But I have no way to diagnose that.
Any help is appreciated. Happy to provide any more details.
spark-submit will pick up your env vars and set them as the fs.s3a access +secret + session key, overwriting any you've already set.
If you only want to use the IAM credentials, just set fs.s3a.aws.credentials.provider to com.amazonaws.auth.InstanceProfileCredentialsProvider; it'll be the only one used
Further Reading: Troubleshooting S3A