AWS Docker deployment - amazon-web-services

I have a custom docker image uploaded to ECS. I opened up the permissions to try and get through this issue (I will lock it down again once I can get this to work). I am attempting to deploy the docker image to elastic beanstalk. I have a docker enabled elastic beanstalk environment set up. According to the AWS docs, if I am pulling my image from within AWS, I don't need to pass in credentials. So I upload my Dockerrun.aws.json file and attempt to install it. It fails with the error:
Command failed on instance. Return code: 1 Output: Failed to authenticate with ECR for registry '434875166128' in 'us-east-1'. Hook /opt/elasticbeanstalk/hooks/appdeploy/pre/03build.sh failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
The /var/log/eb-activity.log information has nothing useful in it.
Here's my Dockerrun.aws.json file:
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "{id000xxxx}.dkr.ecr.us-east-1.amazonaws.com/my-repo:1.0.0",
"Update": "true"
},
"Ports": [
{
"ContainerPort": "4000"
}
],
"Logging": "/var/log/app-name"
}
I have also tried adding the authentication with the dockercfg.json file in S3. It didn't work for me either.
Note that I am using a business account instead of a personal account, so there may be some unknown variances as well.
Thanks!
Update: My user has full permissions at the moment too, so there shouldn't be anything permission-wise getting in the way.

I was having the same problem.
Solution:
In AWS -> IAM -> Roles - > pick the role your beanstalk is using.
In my case it was set to aws-elasticbeanstalk-ec2-role
Under Permissions for the role, attach policy: AmazonEC2ContainerRegistryReadOnly
In ECR there is no need to give any permissions to this role.

Assuming
You are using Terraform to provision your infrastructure
You have created a sample ElasticBeanstalk app at least once, so that you have the default role created.
The default ElasticBeanstalk role is named: aws-elasticbeanstalk-ec2-role
Then you can comfortably use the following format to add ECR Read Only policy to the role:
data "aws_iam_role" "elastic_beanstalk_role" {
name = "aws-elasticbeanstalk-ec2-role"
}
resource "aws_iam_policy" "ebs_ecr_policy" {
name = "aws-elasticbeanstalk-ec2-ecr-policy"
description = "Enable elastic-beanstalk to be able to access ECR repository with images"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage",
"ecr:GetLifecyclePolicy",
"ecr:GetLifecyclePolicyPreview",
"ecr:ListTagsForResource",
"ecr:DescribeImageScanFindings"
],
"Resource": "*"
}
]
}
EOF
}
resource "aws_iam_policy_attachment" "ebs_ecr-policy-attach" {
name = "ebs-ecr-policy-attachment"
roles = [data.aws_iam_role.elastic_beanstalk_role.name]
policy_arn = aws_iam_policy.ebs_ecr_policy.arn
}
This way you can manage updates to the role and policy from your infrastructure code.

You can intialize necessary service roles for elastic beanstalk (aws-elasticbeanstalk-ec2-role , aws-elasticbeanstalk-service-role , AWSServiceRoleForECS ) by using the new console of Elastic Beanstalk.
You have to do this only one time on each AWS account :
Go to the Elastic beanstalk console.
Accept the "new design" : in the top of the console, if see a message "we re testing a new design", optin to accept to use the new version of the console. Warning, it seems you cant rollback to the old console.
Start the Create New Application wizard, and use a default sample application in the technology.
Complete all the step of the wizard until the resume, and look at the Security pannel : you will see the two roles "aws-elasticbeanstalk-ec2-role" and "aws-elasticbeanstalk-service-role". And terminate the wizard to create the sample app.
After a while, the application should be running
In case of emergency, go to the IAM console and delete the roles aws-elasticbeanstalk-ec2-role and aws-elasticbeanstalk-service-role and run the wizard again.
I fixed the "Command failed on instance. Return code: 1 Output: Failed to authenticate with ECR for registry" and an other strange error ("The AWS Access Key Id you provided does not exist in our records. (ElasticBeanstalk::ManifestDownloadError)") by using the NEW console. I still had this error with the old one.

Related

EC2 instance unable to access its instance role with awscli and ecs-agent

I've currently writing a Terraform module for EC2 ASGs with ECS. Everything about starting the instances works, including IAM-dependent actions such as KMS-encrypted volumes. However, the ECS agent fails with:
Error getting ECS instance credentials from default chain: NoCredentialProviders: no valid providers in chain.
Unfortunately, most posts I find about this are about the CLI being run without credentials configured, however this should of course use the instance profile.
The posts I do find regarding that are about missing IAM policies for the instance role. However, I do have this trust relationship
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com",
"ecs.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
(Added ECS because someone on SO had it in there, I don't think that's right. Also removed some conditions.)
It has these policies attached:
AmazonSSMManagedInstanceCore
AmazonEC2ContainerServiceforEC2Role
When I connect via SSH and try to run awscli commands, I get the error
Unable to locate credentials. You can configure credentials by running "aws configure".
running any command.
But with
curl http://169.254.169.254/latest/meta-data/iam/info
I see the correct instance profile ARN and with
curl http://169.254.169.254/latest/meta-data/identity-credentials/ec2/security-credentials/ec2-instance
I see temporary credentials. With these in the awscli configuration,
aws sts get-caller-identity
returns the correct results. There are no iptables rules or routes blocking access to the metadata service and I've deactivated IMDSv2 token requirement for the time being.
I'm using the latest stable version of ECS-optimized Amazon Linux 2.
What might be the issue here?

AWS Allow Cross-account EKS Cluster to Pull Images from ECR

Summary:
I'm looking to enable EKS nodes to pull images from an ECR registry from a different AWS project. I created an "AllowPull" policy in the desired ECR repository and set the principal of the policy to the ARN of the EKS cluster role, but node is unable to pull the image.
How should the policy be formulated in order to allow all nodes in an EKS cluster to pull from a cross-account ECR repository?
Attempt Details:
The ECR registry recourse name is:
arn:aws:ecr:us-east-2:226427918358:repository/external-pull-test
The EKS cluster that needs to pull the images has the following role attached:
arn:aws:iam::02182452XXXX:role/aws-dev-eks-cluster-crpiXXXX091410594876160000000c
The external ECR registry has the following policy JSON:
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "AllowPull",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::02182452XXXX:role/aws-dev-eks-cluster-crpiXXXX091410594876160000000c"
},
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:DescribeImages",
"ecr:DescribeRepositories",
"ecr:GetDownloadUrlForLayer"
]
}
]
}
The pod ImagePullBackOff error specifies that the user attempting to authenticate with the registry is this assumed role:
arn:aws:sts::02182452XXXX:assumed-role/aws-dev-eks-cluster-crpiXXXX091410594876160000000c/i-0ea4f53b6dfdcxxxx
Environment:
Kubernetes: v1.16.15-eks-e1a842
Additional Details:
Using the ARN of my user principal (cross-account) in the policy did allow me to pull images using docker locally. Using the ARN of the assumed role did enable the node to pull the image, but my understanding is that configuring the policy with a particular assumed role won't guarentee that the cluster nodes can consistently pull from the registry.
Another method is click on the "external-pull-test" repo on the ECR console, on the left panel under "Repositories" click on "Permissions", then click on "Edit" on the top right. You can add the account ID that needs to pull from this repo at "AWS account IDs". Check the permitted actions at the bottom "Actions" drop down box. "Save" and you should be able to pull.

Terraform unable to assume IAM Role

I am trying to get started with Terraform and am using GitLab CI/CD to interact with it. My Runner is unable to assume the IAM Role which has elevated privileges to create AWS resources. My Google-fu on this has failed me.
The error received is:
Error: error configuring Terraform AWS Provider: IAM Role
(my:arn) cannot be assumed. There are a number of possible causes of this - the most common are:
The credentials used in order to assume the role are invalid
The credentials do not have appropriate permission to assume the role
The role ARN is not valid
I have created an access/secret key in IAM and have attempted supplying these as GitLab CI/CD Variables, environment variables that I directly export in my before_script, and even the not-recommended hardcoding them into the provider stanza. No matter what, I still get this same error.
What is extra strange is that AWS shows that the key is being used. The "Last Used" column will always reflect a timestamp of the last attempt at running the pipeline. For better or worse, the key is part of my root AWS account - this is a sandbox project and I don't have any IAM Users, so, it's not clear to me how Terraform is unable to use these credentials to assume a Role when, according to AWS, it's able to access my account with them, and my account has root privileges.
Here is my provider.tf:
terraform {
required_version = ">= 0.14"
backend "s3" { }
}
provider "aws" {
region = "us-east-1"
access_key = "redacted"
secret_key = "redacted"
assume_role {
role_arn = "arn:aws:iam::redacted:role/gitlab-runner-role"
}
}
Here is the relevant section of my .gitlab-ci.yml for this stage:
.terraform_init: &terraform_init |-
terraform init -backend-config="bucket=my-terraform-state" -backend-config="region=us-east-1" -backend-config="key=terraform.tfstate"
tf-plan:
image:
name: hashicorp/terraform
entrypoint: [""]
stage: plan
before_script:
- *terraform_init
script:
- terraform plan -out=tfplan.plan
- terraform show --json tfplan.plan | convert_report > tfplan.json
needs:
- job: tf-val
tags:
- my-runner
My main.tf only contains a basic aws_instance stanza and my terraform validate stage (omitted above) says it's in ship-shape. These are the only 3 files in my repo.
My gitlab-runner-role only contains one Policy, gitlab-runner-policy, whose JSON is:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::*/*",
"arn:aws:s3:::my-terraform-state"
]
}
]
}
TIA for any advisement... really banging my head up against the wall on this one.
Turns out that assume_role is only needed for cross-account work. I was doing all of the work within my own account, so removing this allowed Terraform to just use the keys to do the work without needing a different IAM Role (or it's able to do what it needs to via the Role that is attached to the Runner as an instance profile). It's not clear to me why specifying assume_role anyway would result in an error, since the access should be there, but removing it has fixed this issue.

ECS Fargate task not applying role

I have an ECS Fargate task running that has a role attached to it. This role has the S3FullAccess policy (and AssumeRole trusted partnership with ECS service).
However when trying to put an object into a bucket, I get Access Denied errors. I have tried booting an EC2 instance and attaching the same role and can put to the bucket without issue.
To me it seems like the role is not being attached to the task. Is there an important step I'm missing? I can't SSH into the instance as it's Fargate.
UPDATE:
I extracted the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables that are set and used them on my local machine. I am getting the Access Denied issues there too, implying (to me) that none of the polices I have set for that role are being applied to the task.
Anyone that can help with anything is appreciated!
WORKAROUND:
A simple workaround is to create an IAM User with programmatic access and set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables in your task definition.
This works, but does not explain the underlying issue.
I've just had a similar issue and I think it's probably due to your program being unable to access the role's credentials that are exposed by the Instance Metadata service.
Specifically, there's an environment variable called AWS_CONTAINER_CREDENTIALS_RELATIVE_URI and its value is what's needed by the AWS SDKs to use the task role. The ECS Container Agent sets it when your task starts, and it is exposed to the container's main process that has process ID 1. If your program isn't running as such, it might not being seeing the env var and so explaining the access denied error.
Depending on how your program is running, there'll be different ways to share the env var.
I had the issue inside ssh login shells (BTW you can ssh into Fargate tasks by running sshd) so in my Docker entrypoint script I inserted somewhere:
# To share the env var with login shells
echo "export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" >> /root/.profile
In other cases it might work to add to your Docker entrypoint script:
# To export the env var for use by child processes
export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
References:
IAM Roles for Tasks - docs explaining the env var relating to the role
AWS Forum post - where someone explains these workarounds in greater detail
Amazon ECS container credentials– loaded from the Amazon ECS if the environment variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is set.
You define the IAM role to use in your task definitions, or you can use a taskRoleArn override when running a task manually with the RunTask API operation. The Amazon ECS agent receives a payload message for starting the task with additional fields that contain the role credentials. The Amazon ECS agent sets a unique task credential ID as an identification token and updates its internal credential cache so that the identification token for the task points to the role credentials that are received in the payload. The Amazon ECS agent populates the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable in the Env object (available with the docker inspect container_id command) for all containers that belong to this task with the following relative URI: /credential_provider_version/credentials?id=task_credential_id.
Terraform code:
resource "aws_iam_role" "AmazonS3ServiceForECSTask" {
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": [
"ecs-tasks.amazonaws.com"
]
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
data "aws_iam_policy_document" "bucket_policy" {
statement {
principals {
type = "AWS"
identifiers = [aws_iam_role.AmazonS3ServiceForECSTask.arn]
}
actions = [
"s3:ListBucket",
"s3:GetBucketLocation",
]
resources = [
"arn:aws:s3:::${var.bucket_name}",
]
}
statement {
principals {
type = "AWS"
identifiers = [aws_iam_role.AmazonS3ServiceForECSTask.arn]
}
actions = [
"s3:GetObject",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
]
resources = [
"arn:aws:s3:::${var.bucket_name}/*",
]
}
}
resource "aws_ecs_task_definition" "my_app_ecs_task_definition" {
task_role_arn = aws_iam_role.AmazonS3ServiceForECSTask.arn
execution_role_arn = aws_iam_role.ECS-TaskExecution.arn
family = "${var.family}"
network_mode = var.network_mode[var.launch_type]
requires_compatibilities = var.requires_compatibilities
cpu = var.task_cpu[terraform.workspace]
memory = var.task_memory[terraform.workspace]
container_definitions = module.ecs-container-definition.json
}

Amazon cloudwatch agent not working

I'm trying to add aws cloudwatch agent to see additional metrics using tutorial
A brief review of what I did:
Create AIM role and attach to EC2 instance doc (NOTE: I do not use Parameter Store just for communication between EC2 and cloudwatch)
Install Agent using s3 link
Create agent configuration file docs
Run agent using CLI dosc
But it still not working and in agent log, I see errors like
ec2tagger: Unable to initialize EC2 Instance Tags : +NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
While googling I found not much related to cloudwath just only that in AIM role in 'Trust Relationship' config ec2 should be mentioned in service section and it is:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Any ideas, thanks!?
In my case the instance had an IAM role attached, but the role was missing the ec2:DescribeTags permission. Adding that fixed the problem.
"The first procedure creates the IAM role that you must attach to each Amazon EC2 instance that runs the CloudWatch agent. This role provides permissions for reading information from the instance and writing it to CloudWatch." in docs
please attach IAM role that you created to your ec2 instance first,it works for me
The cloudwatch agent process that runs in the ec2 should be able to describe the tags of ec2. The permission required for that is ec2:DescribeTags.
Attaching instance role with the managed policy arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy will resolve the problem.
Check to see if the CloudWatch Agent service is running (started)
I got the same issue, resolve by using below command, refresh routes
Import-Module C:\ProgramData\Amazon\EC2-Windows\Launch\Module\Ec2Launch.psm1; Add-Routes
Solved by running aws configure from inside the instance