I am trying to get started with Terraform and am using GitLab CI/CD to interact with it. My Runner is unable to assume the IAM Role which has elevated privileges to create AWS resources. My Google-fu on this has failed me.
The error received is:
Error: error configuring Terraform AWS Provider: IAM Role
(my:arn) cannot be assumed. There are a number of possible causes of this - the most common are:
The credentials used in order to assume the role are invalid
The credentials do not have appropriate permission to assume the role
The role ARN is not valid
I have created an access/secret key in IAM and have attempted supplying these as GitLab CI/CD Variables, environment variables that I directly export in my before_script, and even the not-recommended hardcoding them into the provider stanza. No matter what, I still get this same error.
What is extra strange is that AWS shows that the key is being used. The "Last Used" column will always reflect a timestamp of the last attempt at running the pipeline. For better or worse, the key is part of my root AWS account - this is a sandbox project and I don't have any IAM Users, so, it's not clear to me how Terraform is unable to use these credentials to assume a Role when, according to AWS, it's able to access my account with them, and my account has root privileges.
Here is my provider.tf:
terraform {
required_version = ">= 0.14"
backend "s3" { }
}
provider "aws" {
region = "us-east-1"
access_key = "redacted"
secret_key = "redacted"
assume_role {
role_arn = "arn:aws:iam::redacted:role/gitlab-runner-role"
}
}
Here is the relevant section of my .gitlab-ci.yml for this stage:
.terraform_init: &terraform_init |-
terraform init -backend-config="bucket=my-terraform-state" -backend-config="region=us-east-1" -backend-config="key=terraform.tfstate"
tf-plan:
image:
name: hashicorp/terraform
entrypoint: [""]
stage: plan
before_script:
- *terraform_init
script:
- terraform plan -out=tfplan.plan
- terraform show --json tfplan.plan | convert_report > tfplan.json
needs:
- job: tf-val
tags:
- my-runner
My main.tf only contains a basic aws_instance stanza and my terraform validate stage (omitted above) says it's in ship-shape. These are the only 3 files in my repo.
My gitlab-runner-role only contains one Policy, gitlab-runner-policy, whose JSON is:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::*/*",
"arn:aws:s3:::my-terraform-state"
]
}
]
}
TIA for any advisement... really banging my head up against the wall on this one.
Turns out that assume_role is only needed for cross-account work. I was doing all of the work within my own account, so removing this allowed Terraform to just use the keys to do the work without needing a different IAM Role (or it's able to do what it needs to via the Role that is attached to the Runner as an instance profile). It's not clear to me why specifying assume_role anyway would result in an error, since the access should be there, but removing it has fixed this issue.
Related
Summary:
I'm looking to enable EKS nodes to pull images from an ECR registry from a different AWS project. I created an "AllowPull" policy in the desired ECR repository and set the principal of the policy to the ARN of the EKS cluster role, but node is unable to pull the image.
How should the policy be formulated in order to allow all nodes in an EKS cluster to pull from a cross-account ECR repository?
Attempt Details:
The ECR registry recourse name is:
arn:aws:ecr:us-east-2:226427918358:repository/external-pull-test
The EKS cluster that needs to pull the images has the following role attached:
arn:aws:iam::02182452XXXX:role/aws-dev-eks-cluster-crpiXXXX091410594876160000000c
The external ECR registry has the following policy JSON:
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "AllowPull",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::02182452XXXX:role/aws-dev-eks-cluster-crpiXXXX091410594876160000000c"
},
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:DescribeImages",
"ecr:DescribeRepositories",
"ecr:GetDownloadUrlForLayer"
]
}
]
}
The pod ImagePullBackOff error specifies that the user attempting to authenticate with the registry is this assumed role:
arn:aws:sts::02182452XXXX:assumed-role/aws-dev-eks-cluster-crpiXXXX091410594876160000000c/i-0ea4f53b6dfdcxxxx
Environment:
Kubernetes: v1.16.15-eks-e1a842
Additional Details:
Using the ARN of my user principal (cross-account) in the policy did allow me to pull images using docker locally. Using the ARN of the assumed role did enable the node to pull the image, but my understanding is that configuring the policy with a particular assumed role won't guarentee that the cluster nodes can consistently pull from the registry.
Another method is click on the "external-pull-test" repo on the ECR console, on the left panel under "Repositories" click on "Permissions", then click on "Edit" on the top right. You can add the account ID that needs to pull from this repo at "AWS account IDs". Check the permitted actions at the bottom "Actions" drop down box. "Save" and you should be able to pull.
I have an ECS Fargate task running that has a role attached to it. This role has the S3FullAccess policy (and AssumeRole trusted partnership with ECS service).
However when trying to put an object into a bucket, I get Access Denied errors. I have tried booting an EC2 instance and attaching the same role and can put to the bucket without issue.
To me it seems like the role is not being attached to the task. Is there an important step I'm missing? I can't SSH into the instance as it's Fargate.
UPDATE:
I extracted the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables that are set and used them on my local machine. I am getting the Access Denied issues there too, implying (to me) that none of the polices I have set for that role are being applied to the task.
Anyone that can help with anything is appreciated!
WORKAROUND:
A simple workaround is to create an IAM User with programmatic access and set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables in your task definition.
This works, but does not explain the underlying issue.
I've just had a similar issue and I think it's probably due to your program being unable to access the role's credentials that are exposed by the Instance Metadata service.
Specifically, there's an environment variable called AWS_CONTAINER_CREDENTIALS_RELATIVE_URI and its value is what's needed by the AWS SDKs to use the task role. The ECS Container Agent sets it when your task starts, and it is exposed to the container's main process that has process ID 1. If your program isn't running as such, it might not being seeing the env var and so explaining the access denied error.
Depending on how your program is running, there'll be different ways to share the env var.
I had the issue inside ssh login shells (BTW you can ssh into Fargate tasks by running sshd) so in my Docker entrypoint script I inserted somewhere:
# To share the env var with login shells
echo "export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" >> /root/.profile
In other cases it might work to add to your Docker entrypoint script:
# To export the env var for use by child processes
export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
References:
IAM Roles for Tasks - docs explaining the env var relating to the role
AWS Forum post - where someone explains these workarounds in greater detail
Amazon ECS container credentials– loaded from the Amazon ECS if the environment variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is set.
You define the IAM role to use in your task definitions, or you can use a taskRoleArn override when running a task manually with the RunTask API operation. The Amazon ECS agent receives a payload message for starting the task with additional fields that contain the role credentials. The Amazon ECS agent sets a unique task credential ID as an identification token and updates its internal credential cache so that the identification token for the task points to the role credentials that are received in the payload. The Amazon ECS agent populates the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable in the Env object (available with the docker inspect container_id command) for all containers that belong to this task with the following relative URI: /credential_provider_version/credentials?id=task_credential_id.
Terraform code:
resource "aws_iam_role" "AmazonS3ServiceForECSTask" {
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": [
"ecs-tasks.amazonaws.com"
]
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
data "aws_iam_policy_document" "bucket_policy" {
statement {
principals {
type = "AWS"
identifiers = [aws_iam_role.AmazonS3ServiceForECSTask.arn]
}
actions = [
"s3:ListBucket",
"s3:GetBucketLocation",
]
resources = [
"arn:aws:s3:::${var.bucket_name}",
]
}
statement {
principals {
type = "AWS"
identifiers = [aws_iam_role.AmazonS3ServiceForECSTask.arn]
}
actions = [
"s3:GetObject",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
]
resources = [
"arn:aws:s3:::${var.bucket_name}/*",
]
}
}
resource "aws_ecs_task_definition" "my_app_ecs_task_definition" {
task_role_arn = aws_iam_role.AmazonS3ServiceForECSTask.arn
execution_role_arn = aws_iam_role.ECS-TaskExecution.arn
family = "${var.family}"
network_mode = var.network_mode[var.launch_type]
requires_compatibilities = var.requires_compatibilities
cpu = var.task_cpu[terraform.workspace]
memory = var.task_memory[terraform.workspace]
container_definitions = module.ecs-container-definition.json
}
I am getting error when i call get_execution_role() from sagemaker in python.
I have attached the error for the same.
I have added the SagemakerFullAccess Policy to role and user both.
get_execution_role() is a function helper used in the Amazon SageMaker Examples GitHub repository.
These examples were made to be executed from the fully managed Jupyter notebooks that Amazon SageMaker provides.
From inside these notebooks, get_execution_role() will return the IAM role name that was passed in as part of the notebook creation. That allows the notebook examples to be executed without code changes.
From outside these notebooks, get_execution_role() will return an exception because it does not know what is the role name that SageMaker requires.
To solve this issue, pass the IAM role name instead of using get_execution_role().
Instead of:
role = get_execution_role()
kmeans = KMeans(role=role,
train_instance_count=2,
train_instance_type='ml.c4.8xlarge',
output_path=output_location,
k=10,
data_location=data_location)
you need to do:
role = 'role_name_with_sagemaker_permissions'
kmeans = KMeans(role=role,
train_instance_count=2,
train_instance_type='ml.c4.8xlarge',
output_path=output_location,
k=10,
data_location=data_location)
I struggled with this for a while and there are a few different pieces but I believe these are the steps to solve (according to this doc)
You must add a role to your aws config file. Go to terminal and enter:
~/.aws/config
Add your own profile
[profile marketingadmin]
role_arn = arn:aws:iam::123456789012:role/marketingadmin
source_profile = default
Then Edit Trust Relationships in the AWS Dashboard:
add this and update:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "sagemaker.amazonaws.com",
"AWS": "arn:aws:iam::XXXXXXX:user/YOURUSERNAME"
},
"Action": "sts:AssumeRole"
}
]
}
Lastly, I clicked the link that says
Give this link to users who can switch roles in the console
After adding my credentials - it worked.
thanks for trying out SageMaker!
The exception you are seeing already suggests the reason. The credentials you are using are not a role credentials but most likely a user.
The format of 'user' credentials will look like:
'arn:aws:iam::accid:user/name' as opposed to a role:
'arn:aws:iam::accid:role/name'
Hope this helps!
I have a custom docker image uploaded to ECS. I opened up the permissions to try and get through this issue (I will lock it down again once I can get this to work). I am attempting to deploy the docker image to elastic beanstalk. I have a docker enabled elastic beanstalk environment set up. According to the AWS docs, if I am pulling my image from within AWS, I don't need to pass in credentials. So I upload my Dockerrun.aws.json file and attempt to install it. It fails with the error:
Command failed on instance. Return code: 1 Output: Failed to authenticate with ECR for registry '434875166128' in 'us-east-1'. Hook /opt/elasticbeanstalk/hooks/appdeploy/pre/03build.sh failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
The /var/log/eb-activity.log information has nothing useful in it.
Here's my Dockerrun.aws.json file:
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "{id000xxxx}.dkr.ecr.us-east-1.amazonaws.com/my-repo:1.0.0",
"Update": "true"
},
"Ports": [
{
"ContainerPort": "4000"
}
],
"Logging": "/var/log/app-name"
}
I have also tried adding the authentication with the dockercfg.json file in S3. It didn't work for me either.
Note that I am using a business account instead of a personal account, so there may be some unknown variances as well.
Thanks!
Update: My user has full permissions at the moment too, so there shouldn't be anything permission-wise getting in the way.
I was having the same problem.
Solution:
In AWS -> IAM -> Roles - > pick the role your beanstalk is using.
In my case it was set to aws-elasticbeanstalk-ec2-role
Under Permissions for the role, attach policy: AmazonEC2ContainerRegistryReadOnly
In ECR there is no need to give any permissions to this role.
Assuming
You are using Terraform to provision your infrastructure
You have created a sample ElasticBeanstalk app at least once, so that you have the default role created.
The default ElasticBeanstalk role is named: aws-elasticbeanstalk-ec2-role
Then you can comfortably use the following format to add ECR Read Only policy to the role:
data "aws_iam_role" "elastic_beanstalk_role" {
name = "aws-elasticbeanstalk-ec2-role"
}
resource "aws_iam_policy" "ebs_ecr_policy" {
name = "aws-elasticbeanstalk-ec2-ecr-policy"
description = "Enable elastic-beanstalk to be able to access ECR repository with images"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage",
"ecr:GetLifecyclePolicy",
"ecr:GetLifecyclePolicyPreview",
"ecr:ListTagsForResource",
"ecr:DescribeImageScanFindings"
],
"Resource": "*"
}
]
}
EOF
}
resource "aws_iam_policy_attachment" "ebs_ecr-policy-attach" {
name = "ebs-ecr-policy-attachment"
roles = [data.aws_iam_role.elastic_beanstalk_role.name]
policy_arn = aws_iam_policy.ebs_ecr_policy.arn
}
This way you can manage updates to the role and policy from your infrastructure code.
You can intialize necessary service roles for elastic beanstalk (aws-elasticbeanstalk-ec2-role , aws-elasticbeanstalk-service-role , AWSServiceRoleForECS ) by using the new console of Elastic Beanstalk.
You have to do this only one time on each AWS account :
Go to the Elastic beanstalk console.
Accept the "new design" : in the top of the console, if see a message "we re testing a new design", optin to accept to use the new version of the console. Warning, it seems you cant rollback to the old console.
Start the Create New Application wizard, and use a default sample application in the technology.
Complete all the step of the wizard until the resume, and look at the Security pannel : you will see the two roles "aws-elasticbeanstalk-ec2-role" and "aws-elasticbeanstalk-service-role". And terminate the wizard to create the sample app.
After a while, the application should be running
In case of emergency, go to the IAM console and delete the roles aws-elasticbeanstalk-ec2-role and aws-elasticbeanstalk-service-role and run the wizard again.
I fixed the "Command failed on instance. Return code: 1 Output: Failed to authenticate with ECR for registry" and an other strange error ("The AWS Access Key Id you provided does not exist in our records. (ElasticBeanstalk::ManifestDownloadError)") by using the NEW console. I still had this error with the old one.
I have files stored on S3 and wrote .ebextensions config to automatically copy the them to new instances. I'm receiving this error in the Elastic Beanstalk console:
[Instance: INSTANCEID Module: AWSEBAutoScalingGroup ConfigSet: null] Command failed on instance. Return code: 1 Output: [CMD-AppDeploy/AppDeployStage0/EbExtensionPreBuild] command failed with error code 1: Error occurred during build: Failed to retrieve https://s3-us-west-1.amazonaws.com/MyBucket/MyFolder/_MyFile.txt: HTTP Error 403 : AccessDenied
My .ebextension config file has this section:
files:
"/target/file/path" :
mode: "000777"
owner: ec2-user
group: ec2-user
source: https://s3-us-west-1.amazonaws.com/_MyBucket_/_MyFolder_/_MyFile.txt
In attempting to make this file copying work, I've also relaxed permissions by giving the elastic beanstalk IAM role the standard read only access policy to all of S3. It's policy is this:
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*"
],
"Resource": "*"
}
Yet the prebuild copying step still fails. Did I give the source url in the correct format? Is there another security entity/policy involved? Help please :)
The documentation is very sketchy on the subject (probably an ideal candidate for StackExchange Docs!).
To do this correctly with .ebextensions, you need to allow the Beanstalk instance IAMs user in the bucket policy, setup an AWS::CloudFormation::Authentication: auth config and attach config to remote sources. This is kind of a hybrid of all the other answers, but all failed in one way or another for me.
Assuming your IAM instance role is aws-elasticbeanstalk-ec2-role:
Set your AWS bucket to allow the Beanstalk IAM User. Edit "bucket policy":
{
"Version": "2012-10-17",
"Id": "BeanstalkS3Copy",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "<beanstalk_iam_role_arn>"
},
"Action": [
"s3:ListBucketVersions",
"s3:ListBucket",
"s3:GetObjectVersion",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<bucket_name>",
"arn:aws:s3:::<bucket_name>/*"
]
}
]
}
where:
beanstalk_iam_role_arn = the fully qualified instance IAMs role. See "IAM role" associated with a running instance if available or see environment configuration. Example: arn:aws:iam::12345689:role/aws-elasticbeanstalk-ec2-role
bucket_name = your bucket name
In your .ebextension/myconfig.config, add an S3 authentication block that uses your IAMs instance user:
Resources:
AWSEBAutoScalingGroup:
Metadata:
AWS::CloudFormation::Authentication:
S3Auth:
type: "s3"
buckets: ["bucket_name"]
roleName:
"Fn::GetOptionSetting":
Namespace: "aws:asg:launchconfiguration"
OptionName: "IamInstanceProfile"
DefaultValue: "aws-elasticbeanstalk-ec2-role"
Set bucket_name appropriately
Define a remote file and attach the S3 Authentication block:
"/etc/myfile.txt" :
mode: "000400"
owner: root
group: root
authentication: "S3Auth" # Matches to auth block above.
source: https://s3-eu-west-1.amazonaws.com/mybucket/myfile.txt
Set your source URL appropriately
Similar to chaseadamsio's answer, you can configure the role given to the EC2 instance with a policy to access S3 resources, then use the pre-installed AWS CLI utilities to move files around.
The way I approached this is to create a role dedicated to the given EB application, then attach a policy similar to:
"Statement": [
{
"Sid": "<sid>",
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<your_bucket_path>/*"
]
}
]
This gives your instance access, then to get the files, add a 'commands' block to your config such as:
commands:
01-get-file:
command: aws s3 cp s3://<your_bucket_path>/your-file.txt /home/ec2-user
02-execute-actions:
[unpack, run scripts, etc..]
Obviously you can use other AWS CLI utlities as needed. I found this solved a lot of problems I was having with S3 access and makes deployment a lot easier.
I found a solution to overcome this error. It turns out adding a Resources section to the .ebextensions config file makes it work. The entire file becomes:
files:
"/target/file/path" :
mode: "000777"
owner: ec2-user
group: ec2-user
source: https://s3-us-west-1.amazonaws.com/_MyBucket_/_MyFolder_/_MyFile.txt
Resources:
AWSEBAutoScalingGroup:
Metadata:
AWS::CloudFormation::Authentication:
S3Access:
type: S3
roleName: aws-elasticbeanstalk-ec2-role
buckets: _MyBucket
At this point, I don't know enough to grok why it has to be this way. Hopefully it can help someone who's lost move forward and eventually gain a better understanding. I based my answer on this link https://forums.aws.amazon.com/message.jspa?messageID=541634
An alternative to setting the .ebextensions config would be to set a policy on the aws-elasticbeanstalk-ec2-role within the IAM Manager (or create a new role specifically for your elastic beanstalk environments to sandbox your autoscaled ec2 instances.
To do so, go to the IAM manager within the web console, and click on "Roles" on the left side. You should see your instance name in the list of roles, clicking on that will take you to the administration page for that particular role. Attach a new role policy to the role under "Permissions" with a policy document matching what you want your ec2 to have permissions to do (in this case, you'd give it a policy to access an s3 bucket called _MyBucket and you should no longer need the Resources section in your .ebextensions config.
If you have your IAM role for the machine configured to get access to the file you can do the following in .ebextensions
commands:
01a_copy_file:
command: aws s3 cp s3://bucket/path/file /destination/