Spark credential chain ordering - S3 Exception Forbidden - amazon-web-services

I'm running Spark 2.4 on an EC2 instance. I am assuming an IAM role and setting the key/secret key/token in the sparkSession.sparkContext.hadoopConfiguration, along with the credentials provider as "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider".
When I try to read a dataset from s3 (using s3a, which is also set in the hadoop config), I get an error that says
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 7376FE009AD36330, AWS Error Code: null, AWS Error Message: Forbidden
read command:
val myData = sparkSession.read.parquet("s3a://myBucket/myKey")
I've repeatedly checked the S3 path and it's correct. My assumed IAM role has the right privileges on the S3 bucket. The only thing I can figure at this point is that spark has some sort of hidden credential chain ordering and even though I have set the credentials in the hadoop config, it is still grabbing credentials from somewhere else (my instance profile???). But I have no way to diagnose that.
Any help is appreciated. Happy to provide any more details.

spark-submit will pick up your env vars and set them as the fs.s3a access +secret + session key, overwriting any you've already set.
If you only want to use the IAM credentials, just set fs.s3a.aws.credentials.provider to com.amazonaws.auth.InstanceProfileCredentialsProvider; it'll be the only one used
Further Reading: Troubleshooting S3A

Related

Unable to connect to S3 while creating Elasticsearch snapshot repository

I am trying to register a respository on AWS S3 to store ElasticSearch snapshots.
I am following guide and ran the very first command listed in the doc.
But I am getting the error Access Denied while executing that command.
The role that is being used to perform operations on S3 is the AmazonEKSNodeRole.
I have assigned the appropriate permissions to the role to perform operations on the S3 bucket.
Also, here is another doc which suggests to use kibana for ElasticSearch version > 7.2 but I am doing the same via cURL requests.
Below is trust Policy of the role through which I am making the request to register repository in the S3 bucket.
Also, below are the screenshots of the permissions of the trusting and trusted accounts respectively -

AWS Sagemaker on local machine: Invalid security token included in the request

I am trying to get AWS Sagemaker to run locally. I found this jupyter notebook
https://gitlab.com/juliensimon/aim410/-/blob/master/local_training.ipynb
I logged into AWS via saml2aws and hence have valid credentials, entered my specific region as well as the Sagemaker Execution Role ARN and specify below the specific image I want to pull.
However when starting the .fit() i getthe following ClientError:
ClientError: An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation: The security token included in the request is invalid.
Can someone give my a hint or suggestion how to solve this issue?
Thanks!
Try to verify your AWS credentials are setup properly, bypassing Boto3, by running a cell with something like:
!aws sagemaker list-endpoints
If this fails, then your AWS CLI credentials aren't setup correctly, or your saml2aws process, or your role has no SageMaker permissions.

spark read from different account s3 and write to my account s3

I have spark job which needs to read the data from s3 which is in other account**(Data Account)** and process that data.
once its processed it should write back to s3 which is in my account.
So I configured access and secret key of "Data account" like below in my spark session
val hadoopConf=sc.hadoopConfiguration
hadoopConf.set("fs.s3a.access.key","DataAccountKey")
hadoopConf.set("fs.s3a.secret.key","DataAccountSecretKey")
hadoopConf.set("fs.s3a.endpoint", "s3.ap-northeast-2.amazonaws.com")
System.setProperty("com.amazonaws.services.s3.enableV4", "true")
val df = spark.read.json("s3a://DataAccountS/path")
/* Reading is success */
df.take(3).write.json("s3a://myaccount/test/")
with this reading is fine, but I am getting below error when writing.
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 301, AWS Service: Amazon S3, AWS Request ID: A5E574113745D6A0, AWS Error Code: PermanentRedirect, AWS Error Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
but If I dont configure details of Data Account and try to write some dummy data to my s3 from spark it works.
So how should I configure to make both reading from different account s3 and writing to my account s3 works
If your spark classpath has hadoop-2.7 JARs on, you can use secrets-in-Paths as the technique, so a URL like s3a://DataAccountKey:DataAccountSecretKey/DataAccount/path. Be aware this will log the secrets everywhere.
Hadoop 2.8+ JARs will tell you off for logging your secrets everywhere, but adds per-bucket binding
spark.hadoop.fs.s3a.bucket.DataAccount.access.key DataAccountKey
spark.hadoop.fs.s3a.bucket.DataAccount.secret.key DataAccountSecretKey
spark.hadoop.fs.s3a.bucket.DataAccount.endpoint s3.ap-northeast-2.amazonaws.com
then for all interaction with that bucket, these per-bucket options will override the main settings.
Note: if you want to use this, don't think dropping hadoop-aws-2.8.jar into your classpath will work, you'll only get classpath errors. All of hadoop-* JAR needs to go to 2.8 and the aws-sdk updated too.

The AWS Access Key Id does not exist in our records

I created a new Access Key and configured that in the AWS CLI with aws configure. It created the .ini file in ~/.aws/config. When I run aws s3 ls it gives:
A client error (InvalidAccessKeyId) occurred when calling the ListBuckets operation: The AWS Access Key Id you provided does not exist in our records.
AmazonS3FullAccess policy is also attached to the user. How to fix this?
It might be happening that you have the old keys exported via env variables (bash_profile) and since the env variables have higher precedence over credential files it is giving the error "the access key id does not exists".
Remove the old keys from the bash_profile and you would be good to go.
Happened with me once earlier when I forgot I have credentials in bash_profile and gave me headache for quite some time :)
It looks like some values have been already set for the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
If it is like that, you could see some values when executing the below commands.
echo $AWS_SECRET_ACCESS_KEY
echo $AWS_ACCESS_KEY_ID
You need to reset these variables, if you are using aws configure
To reset, execute below commands.
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
Need to add aws_session_token in credentials, along with aws_access_key_id,aws_secret_access_key
None of the up-voted answers work for me. Finally I pass the credentials inside the python script, using the client API.
import boto3
client = boto3.client(
's3',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
aws_session_token=SESSION_TOKEN)
Please notice that the aws_session_token argument is optional. Not recommended for public work, but make life easier for simple trial.
For me, I was relying on IAM EC2 roles to give access to our machines to specific resources.
I didn't even know there was a credentials file at ~/.aws/credentials, until I rotated/removed some of our accessKeys at the IAM console to tighten our security, and that suddenly made one of the scripts stop working on a single machine.
Deleting that credentials file fixed it for me.
I made the mistake of setting my variables with quotation marks like this:
AWS_ACCESS_KEY_ID="..."
You may have configured AWS credentials correctly, but using these credentials, you may be connecting to some specific S3 endpoint (as was the case with me).
Instead of using:
aws s3 ls
try using:
aws --endpoint-url=https://<your_s3_endpoint_url> s3 ls
Hope this helps those facing the similar problem.
you can configure profiles in the bash_profile file using
<profile_name>
aws_access_key_id = <access_key>
aws_secret_access_key = <acces_key_secret>
if you are using multiple profiles. then use:
aws s3 ls --profile <profile_name>
You may need to set the AWS_DEFAULT_REGION environment variable.
In my case, I was trying to provision a new bucket in Hong Kong region, which is not enabled by default, according to this:
https://docs.aws.amazon.com/general/latest/gr/s3.html
It's not totally related to OP's question, but to topic per se, so if anyone else like myself finds trapped on this edge case:
I had to enable that region manually, before operating on that AWS s3 region, following this guide: https://docs.aws.amazon.com/general/latest/gr/rande-manage.html
I have been looking for information about this problem and I have found this post. I know it is old, but I would like to leave this post in case anyone has problems.
Okay, I have installed the AWS CLI and opened:
It seems that you need to run aws configure to add the current credentials. Once changed, I can access
Looks like ~/.aws/credentials was not created. Try creating it manually with this content:
[default]
aws_access_key_id = sdfesdwedwedwrdf
aws_secret_access_key = wedfwedwerf3erfweaefdaefafefqaewfqewfqw
(on my test box, if I run aws command without having credentials file, the error is Unable to locate credentials. You can configure credentials by running "aws configure".)
Can you try running these two commands from the same shell you are trying to run aws:
$ export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
$ export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
and then try aws command.
another thing that can cause this, even if everything is set up correctly, is running the command from a Makefile. for example, I had a rule:
awssetup:
aws configure
aws s3 sync s3://mybucket.whatever .
when I ran make awssetup I got the error: fatal error: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.. but running it from the command line worked.
Adding one more answer since all the above cases didn't work for me.
In AWS console, check your credentials(My Security Credentials) and see if you have entered the right credentials.
Thanks to this discussion:
https://forums.aws.amazon.com/message.jspa?messageID=771815
This could happen because there's an issue with your AWS Secret Access Key. After messing around with AWS Amplify, I ran into this issue. The quickest way is to create a new pair of AWS Access Key ID and AWS Secret Access Key and run aws configure again.
I works for me. I hope this helps.
To those of you who run aws s3 ls and getting this exception. Make sure You have permissions to all regions under the provided AWS Account. When running aws s3 ls you try to pull all the s3 buckets under the AWS Account. therefore, in case you don't have permissions to all regions, you'll get this exception - An error occurred (InvalidAccessKeyId) when calling the ListBuckets operation: The AWS Access Key Id you provided does not exist in our records.
Follow Describing your Regions using the AWS CLI for more info.
I had the same problem in windows and using the module aws-sdk of javascript. I have changed my IAM credentials and the problem persisted even if i give the new credentials through the method update like this
s3.config.update({
accessKeyId: 'ACCESS_KEY_ID',
secretAccessKey: 'SECRET_ACCESS_KEY',
region: 'REGION',
});
After a while i found that the module aws-sdk had created a file inside the folder User on windows with this path
C:\Users\User\.aws\credentials
. The credentials inside this file take precedence over the other data passed through the method update.
The solution for me was to write here
C:\Users\User\.aws\credentials
the new credentials and not with the method s3.config.update
Kindly export the below variables from the credential file from the below directory.
path = .aws/
filename = credentials
export aws_access_key_id = AK###########GW
export aws_secret_access_key = g#############################J
Hopefully this saves others from hours of frustration:
call aws.config.update({ before initializing s3.
const AWS = require('aws-sdk');
AWS.config.update({
accessKeyId: 'AKIAW...',
secretAccessKey: 'ptUGSHS....'
});
const s3 = new AWS.S3();
Credits to this answer:
https://stackoverflow.com/a/61914974/11110509
I tries below steps and it worked:
1. cd ~
2. cd .aws
3. vi credentials
4. delete
aws_access_key_id =
aws_secret_access_key =
by placing cursor on that line and pressing dd (vi command to delete line).
Delete both the line and check gain.
If you have an AWS Educate account and you get this problem:
An error occurred (InvalidAccessKeyId) when calling the ListBuckets operation: The AWS Access Key Id you provided does not exist in our records".
The solution is here:
Go to your C:/ drive and search for .aws folder inside your main folder in windows.
Inside that folder you get the "credentials" file and open it with notepad.
Paste the whole key credential from AWS account to the same notepad and save it.
Now you are ready to use you AWS Educate account.
Assuming you already checked Access Key ID and Secret... you might want to check file team-provider-info.json which can be found under amplify/ folder
"awscloudformation": {
"AuthRoleName": "<role identifier>",
"UnauthRoleArn": "arn:aws:iam::<specific to your account and role>",
"AuthRoleArn": "arn:aws:iam::<specific to your account and role>",
"Region": "us-east-1",
"DeploymentBucketName": "<role identifier>",
"UnauthRoleName": "<role identifier>",
"StackName": "amplify-test-dev",
"StackId": "arn:aws:cloudformation:<stack identifier>",
"AmplifyAppId": "<id>"
}
IAM role being referred here should be active in IAM console.
If you get this error in an Amplify project, check that "awsConfigFilePath" is not configured in amplify/.config/local-aws-info.json
In my case I had to remove it, so my environment looked like the following:
{
// **INCORRECT**
// This will not use your profile in ~/.aws/credentials, but instead the
// specified config file path
// "dev": {
// "configLevel": "project",
// "useProfile": false,
// "awsConfigFilePath": "/Users/dev1/.amplify/awscloudformation/cEclTB7ddy"
// },
// **CORRECT**
"dev": {
"configLevel": "project",
"useProfile": true,
"profileName": "default",
}
}
Maybe you need to active you api keys in the web console, I just saw that mine were inactive for some reason...
Thanks, everyone. This helped to solve.
Something somehow happened which changed the keys & I didn't realize since everything was working fine until I connected to S3 from a spark...then from the command line also error started coming even in AWS s3 ls
Steps to solve
Run AWS configure to check if keys are set up (verify from last 4 characters & just keep pressing enter)
AWS console --> Users --> click on the user --> go to security credentials--> check if the key is the same that is showing up in AWS configure
If both not the same, then generate a new key, download csv
run --> AWS configure, set up new keys
try AWS s3 ls now
Change keys at all places in my case it was configs in Cloudera.
I couldn't figure out how to get the system to accept my Vocareum credentials so I took advantage of the fact that if you configure your instance to use IAM roles, the SDK automatically selects the IAM credentials for your application, eliminating the need to manually provide credentials.
Once a role with appropriate permissions was applied to the EC2 instance, I didn't need to provide any credentials.
Open the ~/.bash_profile file and edit the info with the new values that you received at the time of creating the new user:
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_DEFAULT_REGION=us-east-1
Afterward, run the command:
source ~/.bash_profile
This will enable the new keys for the local machine. Now, we will need to configure the info in the terminal as well. Run the command -
aws configure
Provide the new values as requested and you are good to go.
In my case, I was using aws configure
However, I hand-edited the .aws/config file to export the KeyID and key environment variables.
This apparently caused a silent error and saw the error listed above.
I solved this by destroying the .aws directory and running aws configure again.
I have encountered this issue when trying to export RDS Postgres data to S3 following this official guide.
TL;DR Troubleshooting tips:
Reset RDS credentials using:
DROP EXTENSION aws_s3 CASCADE;
DROP EXTENSION aws_commons CASCADE;
CREATE EXTENSION aws_s3 CASCADE;
Delete and add DB instance role used for s3Export feature. Optionally reset RDS credentials (previous action point) once again after that.
Below you will find more details on my case.
In particular, I have encountered:
[XX000] ERROR: could not upload to Amazon S3
Details: Amazon S3 client returned 'The AWS Access Key Id you provided does not exist in our records.'.
To be able to perform export to S3, RDS DB instance should be configured to assume a role with permission to write to S3 bucket, the guide describes these steps.
The reason of an error was in aws_s3.query_export_to_s3 Postgres procedure using some (cached?) invalid assumed credentials. I am still not aware which credentials has it been using but I have managed to achieve the same behaviour using AWS CLI:
I have assumed a role (aws sts assume-role),
And then tried to perform another action (aws s3 cp in particular) with this credentials without session token (only AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY without AWS_SESSION_TOKEN).
This resulted in the same error from AWS CLI: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
In short: hard resetting RDS credentials helped.
I just found another cause/remedy for this error/situation. I was getting the error running a PowerShell script. The error was happening on an execution of Write-S3Object. I have been working with AWS for a while now and have been running this script with success, but had not run it in a while.
My usual method of setting AWS credentials is:
Set-AWSCredential -ProfileName <THE_PROFILE_NAME>
I tried the "aws configure" command and every other recommendation in this forum post. No luck.
Well, I am aware of the .aws\credentials file and took a look in there. I have only three profiles, with one being [default]. Everything was looking good, but then I noticed a new element in there, present in all 3 profiles, that I had not seen before:
toolkit_artifact_guid=64GUID3-GUID-GUID-GUID-004GUID236
(GUID redacting added by me)
Then I noticed this element differed between the profile I was running with and the [default] profile, which was the same profile, except for that.
On a hunch I changed the toolkit_artifact_guid in the [default] to match it to my target profile, and no more error. I have no idea why.

Error During downloading files from S3 to EC2

While i am using "wget" command to download files from amazon S3 to amazon EC2 instance,
it gives following message and file not get downloaded.
How to solve this issue..?
Command :->
"wget https://s3.amazonaws.com/docsbucket/intro.doc"
Error Message :->
"Resolving s3.amazonaws.com... 207.171.163.225
Connecting to s3.amazonaws.com|207.171.163.225|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2013-03-20 13:06:00 ERROR 403: Forbidden."
You should launch your EC2 instance with the permission to read from your S3 buckets.
The easiest way to do it is using Roles. You simply create in IAM (Identity and access management) service of AWS a role that can read from S3. Then you launch your instance with this role. AWS will take care of getting the right credentials onto the instance and you can get your S3 objects, using S3 CLI tools.
You can use the same "trick" to access other resources and other actions on these resources.
You can read more about it in AWS documentations: http://docs.aws.amazon.com/IAM/latest/UserGuide/role-usecase-ec2app.html
Unless the file is public, you will need to authenticate with keys to download the file. This is probably easiest done with a tool like s3cmd.
This worked after I gave read Permission to everyone for the file
Go to Permission Tab - >Public Access->Click Everyone-> then give the Read Permission