ECS Fargate task not applying role - amazon-web-services

I have an ECS Fargate task running that has a role attached to it. This role has the S3FullAccess policy (and AssumeRole trusted partnership with ECS service).
However when trying to put an object into a bucket, I get Access Denied errors. I have tried booting an EC2 instance and attaching the same role and can put to the bucket without issue.
To me it seems like the role is not being attached to the task. Is there an important step I'm missing? I can't SSH into the instance as it's Fargate.
UPDATE:
I extracted the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables that are set and used them on my local machine. I am getting the Access Denied issues there too, implying (to me) that none of the polices I have set for that role are being applied to the task.
Anyone that can help with anything is appreciated!
WORKAROUND:
A simple workaround is to create an IAM User with programmatic access and set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables in your task definition.
This works, but does not explain the underlying issue.

I've just had a similar issue and I think it's probably due to your program being unable to access the role's credentials that are exposed by the Instance Metadata service.
Specifically, there's an environment variable called AWS_CONTAINER_CREDENTIALS_RELATIVE_URI and its value is what's needed by the AWS SDKs to use the task role. The ECS Container Agent sets it when your task starts, and it is exposed to the container's main process that has process ID 1. If your program isn't running as such, it might not being seeing the env var and so explaining the access denied error.
Depending on how your program is running, there'll be different ways to share the env var.
I had the issue inside ssh login shells (BTW you can ssh into Fargate tasks by running sshd) so in my Docker entrypoint script I inserted somewhere:
# To share the env var with login shells
echo "export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" >> /root/.profile
In other cases it might work to add to your Docker entrypoint script:
# To export the env var for use by child processes
export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
References:
IAM Roles for Tasks - docs explaining the env var relating to the role
AWS Forum post - where someone explains these workarounds in greater detail

Amazon ECS container credentials– loaded from the Amazon ECS if the environment variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is set.
You define the IAM role to use in your task definitions, or you can use a taskRoleArn override when running a task manually with the RunTask API operation. The Amazon ECS agent receives a payload message for starting the task with additional fields that contain the role credentials. The Amazon ECS agent sets a unique task credential ID as an identification token and updates its internal credential cache so that the identification token for the task points to the role credentials that are received in the payload. The Amazon ECS agent populates the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable in the Env object (available with the docker inspect container_id command) for all containers that belong to this task with the following relative URI: /credential_provider_version/credentials?id=task_credential_id.
Terraform code:
resource "aws_iam_role" "AmazonS3ServiceForECSTask" {
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": [
"ecs-tasks.amazonaws.com"
]
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
data "aws_iam_policy_document" "bucket_policy" {
statement {
principals {
type = "AWS"
identifiers = [aws_iam_role.AmazonS3ServiceForECSTask.arn]
}
actions = [
"s3:ListBucket",
"s3:GetBucketLocation",
]
resources = [
"arn:aws:s3:::${var.bucket_name}",
]
}
statement {
principals {
type = "AWS"
identifiers = [aws_iam_role.AmazonS3ServiceForECSTask.arn]
}
actions = [
"s3:GetObject",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
]
resources = [
"arn:aws:s3:::${var.bucket_name}/*",
]
}
}
resource "aws_ecs_task_definition" "my_app_ecs_task_definition" {
task_role_arn = aws_iam_role.AmazonS3ServiceForECSTask.arn
execution_role_arn = aws_iam_role.ECS-TaskExecution.arn
family = "${var.family}"
network_mode = var.network_mode[var.launch_type]
requires_compatibilities = var.requires_compatibilities
cpu = var.task_cpu[terraform.workspace]
memory = var.task_memory[terraform.workspace]
container_definitions = module.ecs-container-definition.json
}

Related

EC2 instance unable to access its instance role with awscli and ecs-agent

I've currently writing a Terraform module for EC2 ASGs with ECS. Everything about starting the instances works, including IAM-dependent actions such as KMS-encrypted volumes. However, the ECS agent fails with:
Error getting ECS instance credentials from default chain: NoCredentialProviders: no valid providers in chain.
Unfortunately, most posts I find about this are about the CLI being run without credentials configured, however this should of course use the instance profile.
The posts I do find regarding that are about missing IAM policies for the instance role. However, I do have this trust relationship
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com",
"ecs.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
(Added ECS because someone on SO had it in there, I don't think that's right. Also removed some conditions.)
It has these policies attached:
AmazonSSMManagedInstanceCore
AmazonEC2ContainerServiceforEC2Role
When I connect via SSH and try to run awscli commands, I get the error
Unable to locate credentials. You can configure credentials by running "aws configure".
running any command.
But with
curl http://169.254.169.254/latest/meta-data/iam/info
I see the correct instance profile ARN and with
curl http://169.254.169.254/latest/meta-data/identity-credentials/ec2/security-credentials/ec2-instance
I see temporary credentials. With these in the awscli configuration,
aws sts get-caller-identity
returns the correct results. There are no iptables rules or routes blocking access to the metadata service and I've deactivated IMDSv2 token requirement for the time being.
I'm using the latest stable version of ECS-optimized Amazon Linux 2.
What might be the issue here?

Unable to load credentials from any of the providers in the chain

Trying to use S3 client in my ECS app for uploading files into S3 bucket (let's call it bucket1),
S3 client is instantiated as follows:
import software.amazon.awssdk.services.s3.S3Client;
private S3Client getClient(){
return S3Client.builder().build();
}
Task definition has been configured to use ecsTaskExecutionRole as "Task execution role", so I have added the following permissions policy to it:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::bucket1",
"arn:aws:s3:::bucket1/*"
]
}
]
}
The same permissions policy has been added to ecsInstanceRole as well.
When I try to upload something, the following exception is dropped:
software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from any of the providers in the chain AwsCredentialsProviderChain(credentialsProviders=[SystemPropertyCredentialsProvider(), EnvironmentVariableCredentialsProvider(), WebIdentityTokenCredentialsProvider(), ProfileCredentialsProvider(), ContainerCredentialsProvider(), InstanceProfileCredentialsProvider()]) : [SystemPropertyCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., EnvironmentVariableCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., WebIdentityTokenCredentialsProvider(): Either the environment variable AWS_WEB_IDENTITY_TOKEN_FILE or the javaproperty aws.webIdentityTokenFile must be set., ProfileCredentialsProvider(): Profile file contained no credentials for profile 'default': ProfileFile(profiles=[]), ContainerCredentialsProvider(): Cannot fetch credentials from container - neither AWS_CONTAINER_CREDENTIALS_FULL_URI or AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variables are set., InstanceProfileCredentialsProvider(): Failed to load credentials from IMDS.]
Kind of lost here - am I supposed to specify AWS credentials in the list of environment variables? How to pass credentials otherwise?
By the way, "Task role" has been set to "None", in task definition, if that matters
In your task definition, you should set the task role parameter to the IAM role that you created earlier. This task role is used by the container to access AWS services.
Alternatively, you can use environment variables to create clients as well by setting variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html

Terraform unable to assume IAM Role

I am trying to get started with Terraform and am using GitLab CI/CD to interact with it. My Runner is unable to assume the IAM Role which has elevated privileges to create AWS resources. My Google-fu on this has failed me.
The error received is:
Error: error configuring Terraform AWS Provider: IAM Role
(my:arn) cannot be assumed. There are a number of possible causes of this - the most common are:
The credentials used in order to assume the role are invalid
The credentials do not have appropriate permission to assume the role
The role ARN is not valid
I have created an access/secret key in IAM and have attempted supplying these as GitLab CI/CD Variables, environment variables that I directly export in my before_script, and even the not-recommended hardcoding them into the provider stanza. No matter what, I still get this same error.
What is extra strange is that AWS shows that the key is being used. The "Last Used" column will always reflect a timestamp of the last attempt at running the pipeline. For better or worse, the key is part of my root AWS account - this is a sandbox project and I don't have any IAM Users, so, it's not clear to me how Terraform is unable to use these credentials to assume a Role when, according to AWS, it's able to access my account with them, and my account has root privileges.
Here is my provider.tf:
terraform {
required_version = ">= 0.14"
backend "s3" { }
}
provider "aws" {
region = "us-east-1"
access_key = "redacted"
secret_key = "redacted"
assume_role {
role_arn = "arn:aws:iam::redacted:role/gitlab-runner-role"
}
}
Here is the relevant section of my .gitlab-ci.yml for this stage:
.terraform_init: &terraform_init |-
terraform init -backend-config="bucket=my-terraform-state" -backend-config="region=us-east-1" -backend-config="key=terraform.tfstate"
tf-plan:
image:
name: hashicorp/terraform
entrypoint: [""]
stage: plan
before_script:
- *terraform_init
script:
- terraform plan -out=tfplan.plan
- terraform show --json tfplan.plan | convert_report > tfplan.json
needs:
- job: tf-val
tags:
- my-runner
My main.tf only contains a basic aws_instance stanza and my terraform validate stage (omitted above) says it's in ship-shape. These are the only 3 files in my repo.
My gitlab-runner-role only contains one Policy, gitlab-runner-policy, whose JSON is:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::*/*",
"arn:aws:s3:::my-terraform-state"
]
}
]
}
TIA for any advisement... really banging my head up against the wall on this one.
Turns out that assume_role is only needed for cross-account work. I was doing all of the work within my own account, so removing this allowed Terraform to just use the keys to do the work without needing a different IAM Role (or it's able to do what it needs to via the Role that is attached to the Runner as an instance profile). It's not clear to me why specifying assume_role anyway would result in an error, since the access should be there, but removing it has fixed this issue.

AWS Docker deployment

I have a custom docker image uploaded to ECS. I opened up the permissions to try and get through this issue (I will lock it down again once I can get this to work). I am attempting to deploy the docker image to elastic beanstalk. I have a docker enabled elastic beanstalk environment set up. According to the AWS docs, if I am pulling my image from within AWS, I don't need to pass in credentials. So I upload my Dockerrun.aws.json file and attempt to install it. It fails with the error:
Command failed on instance. Return code: 1 Output: Failed to authenticate with ECR for registry '434875166128' in 'us-east-1'. Hook /opt/elasticbeanstalk/hooks/appdeploy/pre/03build.sh failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
The /var/log/eb-activity.log information has nothing useful in it.
Here's my Dockerrun.aws.json file:
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "{id000xxxx}.dkr.ecr.us-east-1.amazonaws.com/my-repo:1.0.0",
"Update": "true"
},
"Ports": [
{
"ContainerPort": "4000"
}
],
"Logging": "/var/log/app-name"
}
I have also tried adding the authentication with the dockercfg.json file in S3. It didn't work for me either.
Note that I am using a business account instead of a personal account, so there may be some unknown variances as well.
Thanks!
Update: My user has full permissions at the moment too, so there shouldn't be anything permission-wise getting in the way.
I was having the same problem.
Solution:
In AWS -> IAM -> Roles - > pick the role your beanstalk is using.
In my case it was set to aws-elasticbeanstalk-ec2-role
Under Permissions for the role, attach policy: AmazonEC2ContainerRegistryReadOnly
In ECR there is no need to give any permissions to this role.
Assuming
You are using Terraform to provision your infrastructure
You have created a sample ElasticBeanstalk app at least once, so that you have the default role created.
The default ElasticBeanstalk role is named: aws-elasticbeanstalk-ec2-role
Then you can comfortably use the following format to add ECR Read Only policy to the role:
data "aws_iam_role" "elastic_beanstalk_role" {
name = "aws-elasticbeanstalk-ec2-role"
}
resource "aws_iam_policy" "ebs_ecr_policy" {
name = "aws-elasticbeanstalk-ec2-ecr-policy"
description = "Enable elastic-beanstalk to be able to access ECR repository with images"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage",
"ecr:GetLifecyclePolicy",
"ecr:GetLifecyclePolicyPreview",
"ecr:ListTagsForResource",
"ecr:DescribeImageScanFindings"
],
"Resource": "*"
}
]
}
EOF
}
resource "aws_iam_policy_attachment" "ebs_ecr-policy-attach" {
name = "ebs-ecr-policy-attachment"
roles = [data.aws_iam_role.elastic_beanstalk_role.name]
policy_arn = aws_iam_policy.ebs_ecr_policy.arn
}
This way you can manage updates to the role and policy from your infrastructure code.
You can intialize necessary service roles for elastic beanstalk (aws-elasticbeanstalk-ec2-role , aws-elasticbeanstalk-service-role , AWSServiceRoleForECS ) by using the new console of Elastic Beanstalk.
You have to do this only one time on each AWS account :
Go to the Elastic beanstalk console.
Accept the "new design" : in the top of the console, if see a message "we re testing a new design", optin to accept to use the new version of the console. Warning, it seems you cant rollback to the old console.
Start the Create New Application wizard, and use a default sample application in the technology.
Complete all the step of the wizard until the resume, and look at the Security pannel : you will see the two roles "aws-elasticbeanstalk-ec2-role" and "aws-elasticbeanstalk-service-role". And terminate the wizard to create the sample app.
After a while, the application should be running
In case of emergency, go to the IAM console and delete the roles aws-elasticbeanstalk-ec2-role and aws-elasticbeanstalk-service-role and run the wizard again.
I fixed the "Command failed on instance. Return code: 1 Output: Failed to authenticate with ECR for registry" and an other strange error ("The AWS Access Key Id you provided does not exist in our records. (ElasticBeanstalk::ManifestDownloadError)") by using the NEW console. I still had this error with the old one.

How to deploy and redeploy applications with Terraform?

I'm looking into Terraform and how to use it to setup an AWS environment. So far I have the scripts for setting up a VPC with 3 public subnets, 3 private subnets, an Internet Gateway and 3 Nat Gateways. However I'm confused as to how one would go about deploy and redeploying applications in private subnets?
In my scenario we build micro-services using Spring Boot. The idea is to move to a state where we can have Elastic Load Balancers attached to the public subnets and host our applications in autoscale groups in the private subnets. However I can't find any good tutorials regarding Terraform that show you how to do this in a way that applications can be redeployed from Jenkins.
So far I've read about Opsworks and Code Deploy so would I need to use Terraform to setup these resources and then trigger the deployment scripts to send artefacts to S3 that are then redeployed?
For deploy/redeploy, you can use another solution by Hashicorp: Nomad. It uses the same language as Terraform to program tasks that you can run on a cluster. Tasks can be anything, for example: redeploy all my web app instances.
I'm using CodeDeploy with Terraform/Chef. The setup I'm using goes something like this:
1) Manually setup the CodeDeploy IAM Roles ahead of time.
2) Setup the CodeDeploy App/Group ahead of time.
3) Setup the Instance Profile using Terraform, like this:
resource "aws_iam_instance_profile" "code_deploy" {
name = "CodeDeploy"
roles = ["${var.codedeploy_instance_role}"]
}
4) Use the Instance Profile and the correct tags (that match your CodeDeploy app) when making an instance, like this:
iam_instance_profile = "${aws_iam_instance_profile.code_deploy.id}"
tags {
CD = "${var.tag_cd}"
}
5) Use Chef (or whatever your provisioner is) to setup CodeDeploy on the instance.
Then you're good to use CodeDeploy like normal.
Adding this so that in case someone is looking for more information, might find this useful.
Building on the Solution from Peter, I am setting up the CodeDeploy IAM Roles and CodeDeploy App/Group from Terraform as well. Here is what I have:
resource "aws_iam_role" "codedeploy_role_name" {
name = "codedeploy_role_name"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"codedeploy.amazonaws.com",
"ec2.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}
resource "aws_codedeploy_app" "analytics_app" {
name = "analytics_app"
}
resource "aws_codedeploy_deployment_config" "analytics_deployment_config" {
deployment_config_name = "analytics_deployment_config"
minimum_healthy_hosts {
type = "HOST_COUNT"
value = 2
}
}
resource "aws_codedeploy_deployment_group" "analytics_group" {
app_name = "${aws_codedeploy_app.analytics_app.name}"
deployment_group_name = "analytics_group"
service_role_arn = "${aws_iam_role.codedeploy_role_name.arn}"
deployment_config_name = "analytics_deployment_config"
ec2_tag_filter {
key = "CodeDeploy"
type = "KEY_AND_VALUE"
value = "analytics"
}
auto_rollback_configuration {
enabled = true
events = ["DEPLOYMENT_FAILURE"]
}
}