AWS role vs iam credential in duckdb HTTPFS call

AWS role vs iam credential in duckdb HTTPFS call - amazon-web-services

I am pretty baffled and I don't know what is going on with this one.
I'm using DuckDB to query parquet files in an s3 bucket.
import pandas as pd
import duckdb
query = """
INSTALL httpfs;
LOAD httpfs;
SET s3_region='us-west-2';
SET s3_access_key_id='key';
SET s3_secret_access_key='secret';
SELECT
FROM read_parquet('s3://bucket/folder/file.parquet')
cursor = duckdb.connect()
cursor.execute(query).df()
I have an IAM user with admin access. I am able to query this parquet file with programatic access keys. I also have a role that I want to use in an application that I have also given admin access just for testing purposes.
When I assume the role and create temporary credentials and input those into the code above
export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \
$(aws sts assume-role \
--role-arn arn:aws:iam::<account-id>:role/<role-name> \
--role-session-name test-session \
--query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
--output text))
I get the error
duckdb.Error: Invalid Error: Unable to connect to URL
"s3://bucket/folder/file.parquet": 403 (Forbidden)
However, when I use my IAM user, I am able to access this s3 object and query the data just fine. Is there something I am missing about the difference between roles and IAM users?
If it helps, what I am trying to do is create a role for a lambda function and then access the environmental variables AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY with os.getenviron() in the code above. I believe if I can get the role working by writing in the temporary credentials it should work when I use os.getenv() in the lambda function.

I had a very similar issue, after also setting the s3_session_token via SET s3_session_token='sessiontoken'; it worked.
The code would be changed to
import pandas as pd
import duckdb
query = """
INSTALL httpfs;
LOAD httpfs;
SET s3_region='us-west-2';
SET s3_access_key_id='key';
SET s3_secret_access_key='secret';
SET s3_session_token='session-token';
SELECT
FROM read_parquet('s3://bucket/folder/file.parquet')
cursor = duckdb.connect()
cursor.execute(query).df()

Related

How does assuming multiple IAM roles at the same time work?

I have an ECS task using a task role to access a DynamoDB table in the same account A. It also requires access to a DynamoDB table in a different account B, which is granted through assuming an IAM role.
My understanding is that after assuming the role, the task now has a set of temporary credentials for each role. This allows the task to use the new credentials to make requests to account B's table, while still using the original credentials to make requests to account A's table.
Assuming this is correct, how are the creds used for a given request determined? Does it only use the cross account role for making account B requests, and the original creds for the account A requests?
What if access to account B S3 buckets are also required and the permissions were granted to account A, which were then given to original task role? After assuming the cross account role, does the cross account S3 request fail because the assumed role doesn't have S3 permissions, even though the original take role does?

AWS resources cannot just assume a role themselves. They have to be told to do so, and use the SDK of your choice to do so (or the CLI). Soon as you understand how that works it becomes a lot more clear how this works. Since you mentioned and ec2 instance, I'll use the CLI to show this
AcctCredentials=($(aws sts assume-role --role-arn "$1" --role-session-name TheSessionName --query '[Credentials.AccessKeyId,Credentials.SecretAccessKey,Credentials.SessionToken]' --output text))
unset AWS_SECURITY_TOKEN
echo "Security Tokens for Cross Account Access received"
export AWS_ACCESS_KEY_ID=${AcctCredentials[0]}
echo $AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=${AcctCredentials[1]}
export AWS_SESSION_TOKEN=${AcctCredentials[2]}
export AWS_SECURITY_TOKEN=${AcctCredentials[2]}
doing that you are setting your env variables in the ec2 to these new credentials. This means any other CLI commands run or any script that is launched from the same shell as this one will use these credentials.
If you need to go back to the credentials from before, you will either need to reset/save your credentials from before or exit the shell this command was run in and return to your default credentials.
If this was in a lambda for instance however, you might be using Python and the Boto3 to do something very similar. It would replace the tokens there.
It is also entirely possible to save your tokens as a Profile that the commands can use, and then per command specify the profile you are using for that command.

Getting S3 public policy using boto3

I want to get the bucket policy for the various buckets. I tried the following code snippet(picked from the boto3 documentation):
conn = boto3.resource('s3')
bucket_policy=conn.BucketPolicy('demo-bucket-py')
print(bucket_policy)
But here's the output I get :
s3.BucketPolicy(bucket_name='demo-bucket-py')
What shall I rectify here ? Or is there some another way to get the access policy for s3 ?

Try print(bucket_policy.policy). More information on that here.

this worked for me
import boto3
# Create an S3 client
s3 = boto3.client('s3')
# Call to S3 to retrieve the policy for the given bucket
result = s3.get_bucket_policy(Bucket='my-bucket')
print(result)
to perform this you need to configure or mention your keys like this s3=boto3.client("s3",aws_access_key_id=access_key_id,aws_secret_access_key=secret_key). BUT there is much better way to do this is by using aws configure command and enter your credentials. for setting up docs. Once you set up you wont need to enter your keys again in your code, boto3 or aws cli will automatically fetch it behind the scenes .https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html.
you can even set different profiles to work with different accounts

AWS S3 Transfer acceleration status not alterable

I need to activate transfert accelerate on one of my buckets in S3, but I can't because of account limitation.
I've tried so far:
creating a user with IAM, gave him AdministratorAccess, create a bucket, enable transfer accelerate, got this (via the CLI):
An error occurred (AccessDenied) when calling the PutBucketAccelerateConfiguration operation: Access Denied
same thing via the console, same error.
same thing with the root account ( I guess with the root account I have all the permissions ).

still relevant?
if so, from the official docs:
Add a named profile for the administrator user in the AWS CLI config file. You use this profile when executing the AWS CLI commands.
[adminuser]
aws_access_key_id = adminuser access key ID
aws_secret_access_key = adminuser secret access key
region = aws-region
does this work for you?
aws s3 ls --profile adminuser

aws access s3 from spark using IAM role

I want to access s3 from spark, I don't want to configure any secret and access keys, I want to access with configuring the IAM role, so I followed the steps given in s3-spark
But still it is not working from my EC2 instance (which is running standalone spark)
it works when I tested
[ec2-user#ip-172-31-17-146 bin]$ aws s3 ls s3://testmys3/
2019-01-16 17:32:38 130 e.json
but it did not work when I tried like below
scala> val df = spark.read.json("s3a://testmys3/*")
I am getting the below error
19/01/16 18:23:06 WARN FileStreamSink: Error while looking for metadata directory.
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: E295957C21AFAC37, AWS Error Code: null, AWS Error Message: Bad Request
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:616)

this config worked
./spark-shell \
--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3 \
--conf spark.hadoop.fs.s3a.endpoint=s3.us-east-2.amazonaws.com \
spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.InstanceProfileCredentialsProvider \
--conf spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true \
--conf spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true

"400 Bad Request" is fairly unhelpful, and not only does S3 not provide much, the S3A connector doesn't date print much related to auth either. There's a big section on troubleshooting the error
The fact it got as far as making a request means that it has some credentials, only the far end doesn't like them
Possibilities
your IAM role doesn't have the permissions for s3:ListBucket. See IAM role permissions for working with s3a
your bucket name is wrong
There's some settings in fs.s3a or the AWS_ env vars which get priority over the IAM role, and they are wrong.
You should automatically have IAM auth as an authentication mechanism with the S3A connector; its the one which is checked last after: config & env vars.
Have a look at what is set in fs.s3a.aws.credentials.provider -it must be unset or contain the option com.amazonaws.auth.InstanceProfileCredentialsProvider
assuming you also have hadoop on the command line, grab storediag
hadoop jar cloudstore-0.1-SNAPSHOT.jar storediag s3a://testmys3/
it should dump what it is up to regarding authentication.
Update
As the original poster has commented, it was due to v4 authentication being required on the specific S3 endpoint. This can be enabled on the 2.7.x version of the s3a client, but only via Java system properties. For 2.8+ there are some fs.s3a. options you can set it instead

step1. to config spark container framework like Yarn core-site.xml.Then restart Yarn
fs.s3a.aws.credentials.provider--
com.cloudera.com.amazonaws.auth.InstanceProfileCredentialsProvider
fs.s3a.endpoint--
s3-ap-northeast-2.amazonaws.com
fs.s3.impl--
org.apache.hadoop.fs.s3a.S3AFileSystem
step2. spark shell to test as follow.
val rdd=sc.textFile("s3a://path/file")
rdd.count()
rdd.take(10).foreach(println)
It works for me

An error occurred (InvalidClientTokenId) when calling the AssumeRole operation: The security token included in the request is invalid

I'm constantly getting this error:
An error occurred (InvalidClientTokenId) when calling the AssumeRole operation: The security token included in the request is invalid.
when I run this Assume Role command:
aws sts assume-role --role-arn <arn role i want to assume> --role-session-name dev --serial-number <my arn> --token-code <keyed in token code>
This was working previously so I'm not sure what could have changed. And at a loss at how to debug this.
Any suggestions?

I had the same problem. You may need to unset your AWS env variables before running the sts command:
unset AWS_SECRET_ACCESS_KEY
unset AWS_SECRET_KEY
unset AWS_SESSION_TOKEN
and then your command:
aws sts assume-role --role-arn <arn role i want to assume> --role-session-name dev --serial-number <my arn> --token-code <keyed in token code>
Here you'll get new credentials. Then run the exports again:
export AWS_ACCESS_KEY_ID=<access key>
export AWS_SECRET_ACCESS_KEY=<secret access key>
export AWS_SESSION_TOKEN=<session token>
I hope it helps!

Check your aws_access_key_id and aws_secret_access_key are correct in the ~/.aws/credentials file.
If they are then if the ~/.aws/credentials file contains a aws_session_token delete only that line in the file, save your changes and re-run your command.
Worked for me.

You need to do an aws configure and set the AWS access key and secret key on the environment where you are running the STS command if its the first time you are running. The STS command verifies the identity using that data and checks if you have permissions to perform STS assume-role.
If you already have an existing access key and secret key configured, it could be possible that those have also been expired. So you might need to generate new keys for the user in IAM and configure it in the environment you are running.

I noticed that when I had to change my AWS IAM password, my access keys were also erased.
I had to generate a new access key and replace the aws_access_key_id and aws_secret_access_key stored in the ~/.aws/credentials (on mac) file.
This cleared the error

Old post, but might still be useful.
Can you try setting the following in the env and retry:
export AWS_ACCESS_KEY_ID='your access key id here';
export AWS_SECRET_KEY='your secret key here'

Here you need to reset your aws secret key and ID like -
export AWS_ACCESS_KEY_ID='ACCESS_KEYID';
export AWS_SECRET_KEY='SECRET_KEY'
in my case I use command aws configure in terminal for cli and it asks for access key id, secret key, region and output format.

TLDR; The IAM user's key/secret key is one set of credentials. The session token + session's key/secret key are a different set of credentials, with different values.
--
I did not know the aws sts command created a session token, and new a AWS key/secret key. These keys are not the same as your IAM user key and secret key.
I generated a new key, secret key, and token. I updated my credentials file to use the new values. When my token expired the next day, I re-ran the aws sts command. However, the key and secret key in the credentials file were now wrong. I replaced the key and secret key values my IAM user keys in my credentials file. I also delete the session token in my credentials file.
Then the aws sts command worked.

Fixed this issue by re-activating my access keys in iam user credentials section of your user.
It was in-active when I got this issue.

I have tried to create on fly ~/.aws/config file with multi-profile to give my code build multi-account access. I made a mistake when I first create the file ~/.aws/config with an empty default profile and then tried to assume the role.
When you put the file ~/.aws/config in a place with a default profile, it is the profile that determines the identity and not the one that comes with the CodeBuild.

July 2022 Update!
Make sure the target AWS region is enabled. Here is how you can enable a region.

AUGUST 2022 Update!
Let's not forget the simplest of things. I got this same exact error message when I did not have a default region set. You can set it by running aws configure. Or, you can pass the region on sts command like so.
aws sts assume-role --role-arn "your_role_arn" --role-session-name MySessionName --region us-gov-west-1

I've got this error when I was trying to connect to my localstack instance.
The cause was the missing --endpoint-url http://localhost:4566 argument for aws cli.

Same error, but with a very specific situation and solution. In my case I have a profile that assumes a role in my .aws/credentials like the following:
[name]
role_arn = arn:aws:iam::ACCOUNT:role/ROLE_NAME
source_profile = default
I was working on the role, and while doing so deleted and recreated it. AWS CLI caches the token when using this profile method. The cached token was for the old role not the recreated one. This can be seen by adding --debug to the CLI command:
2023-01-27 16:22:03,055 - MainThread - botocore.credentials - DEBUG - Credentials for role retrieved from cache.
2023-01-27 16:22:03,055 - MainThread - botocore.credentials - DEBUG - Retrieved credentials will expire at: 2023-01-27 21:46:59+00:00
The cache can be wiped by removing the ~/.aws/cli/cache directory or the specific JSON file for this session found inside that directory.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

AWS role vs iam credential in duckdb HTTPFS call - amazon-web-services

Related

How does assuming multiple IAM roles at the same time work?

Getting S3 public policy using boto3

AWS S3 Transfer acceleration status not alterable

aws access s3 from spark using IAM role

An error occurred (InvalidClientTokenId) when calling the AssumeRole operation: The security token included in the request is invalid

Categories

Resources