How to deploy and redeploy applications with Terraform? - amazon-web-services

I'm looking into Terraform and how to use it to setup an AWS environment. So far I have the scripts for setting up a VPC with 3 public subnets, 3 private subnets, an Internet Gateway and 3 Nat Gateways. However I'm confused as to how one would go about deploy and redeploying applications in private subnets?
In my scenario we build micro-services using Spring Boot. The idea is to move to a state where we can have Elastic Load Balancers attached to the public subnets and host our applications in autoscale groups in the private subnets. However I can't find any good tutorials regarding Terraform that show you how to do this in a way that applications can be redeployed from Jenkins.
So far I've read about Opsworks and Code Deploy so would I need to use Terraform to setup these resources and then trigger the deployment scripts to send artefacts to S3 that are then redeployed?

For deploy/redeploy, you can use another solution by Hashicorp: Nomad. It uses the same language as Terraform to program tasks that you can run on a cluster. Tasks can be anything, for example: redeploy all my web app instances.

I'm using CodeDeploy with Terraform/Chef. The setup I'm using goes something like this:
1) Manually setup the CodeDeploy IAM Roles ahead of time.
2) Setup the CodeDeploy App/Group ahead of time.
3) Setup the Instance Profile using Terraform, like this:
resource "aws_iam_instance_profile" "code_deploy" {
name = "CodeDeploy"
roles = ["${var.codedeploy_instance_role}"]
}
4) Use the Instance Profile and the correct tags (that match your CodeDeploy app) when making an instance, like this:
iam_instance_profile = "${aws_iam_instance_profile.code_deploy.id}"
tags {
CD = "${var.tag_cd}"
}
5) Use Chef (or whatever your provisioner is) to setup CodeDeploy on the instance.
Then you're good to use CodeDeploy like normal.

Adding this so that in case someone is looking for more information, might find this useful.
Building on the Solution from Peter, I am setting up the CodeDeploy IAM Roles and CodeDeploy App/Group from Terraform as well. Here is what I have:
resource "aws_iam_role" "codedeploy_role_name" {
name = "codedeploy_role_name"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"codedeploy.amazonaws.com",
"ec2.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}
resource "aws_codedeploy_app" "analytics_app" {
name = "analytics_app"
}
resource "aws_codedeploy_deployment_config" "analytics_deployment_config" {
deployment_config_name = "analytics_deployment_config"
minimum_healthy_hosts {
type = "HOST_COUNT"
value = 2
}
}
resource "aws_codedeploy_deployment_group" "analytics_group" {
app_name = "${aws_codedeploy_app.analytics_app.name}"
deployment_group_name = "analytics_group"
service_role_arn = "${aws_iam_role.codedeploy_role_name.arn}"
deployment_config_name = "analytics_deployment_config"
ec2_tag_filter {
key = "CodeDeploy"
type = "KEY_AND_VALUE"
value = "analytics"
}
auto_rollback_configuration {
enabled = true
events = ["DEPLOYMENT_FAILURE"]
}
}

Related

ECS - FileSystemNotFound: File system does not exist

I have an ECS service which is of Launch Type EC2 owned by an AWS account A. Our IT team has created an FSx storage owned by an AWS Account B:
When I try to launch tasks I get this not authorized error in the Stopped reason section of the task:
Fsx describing filesystem(s) from the service for [fs-0fd8b05f434cf0e72]:
FileSystemNotFound: File system 'fs-0fd8b05f434cf0e72' does not exist.
I have attached those 2 policies to the EC2 (container host) instance:
AmazonFSxReadOnlyAccess (AWS Managed)
fsx_mount (Customer Managed)
fsx_mount:
{
"Statement": [
{
"Action": [
"secretsmanager:GetSecretValue"
],
"Effect": "Allow",
"Resource": "arn:aws:secretsmanager:us-west-2:111111111111:secret:dev/rushmore/ad-account-NKOkyh"
},
{
"Action": [
"fsx:*",
"ds:DescribeDirectories"
],
"Effect": "Allow",
"Resource": "arn:aws:fsx:eu-west-1:222222222222:file-system/fs-0fd8b05f434cf0e72"
}
],
"Version": "2012-10-17"
}
Note that the account id of 222222222222 represents AWS Account B.
Terraform aws_ecs_task_definition:
resource "aws_ecs_task_definition" "participants_task" {
volume {
name = "FSxStorage"
fsx_windows_file_server_volume_configuration {
file_system_id = "fs-0fd8b05f434cf0e72"
root_directory = "\\data"
authorization_config {
credentials_parameter = aws_secretsmanager_secret_version.fsx_account_secret.arn
domain = var.domain
}
}
}
...
}
I am not sure why ECS cannot "see" the FSx file system. Surely it must be because it is in another AWS account but I don't know what changes are required in order to fix this.
From AWS documentation:
You can access your FSx for Windows File Server file system from
compute instances in a different VPC, AWS account, or AWS Region from
that associated with your file system. To do so, you can use VPC
peering or transit gateways. When you use a VPC peering connection or
transit gateway to connect VPCs, compute instances that are in one VPC
can access Amazon FSx file systems in another VPC. This access is
possible even if the VPCs belong to different accounts, and even if
the VPCs reside in different AWS Regions.
The short version of the above text is that your ECS service and Amazon FSx Windows File server either need to be in the same VPC or need to be in VPCs which are connected to each other (via VPC peering or Transit Gateway).

ECS Fargate task not applying role

I have an ECS Fargate task running that has a role attached to it. This role has the S3FullAccess policy (and AssumeRole trusted partnership with ECS service).
However when trying to put an object into a bucket, I get Access Denied errors. I have tried booting an EC2 instance and attaching the same role and can put to the bucket without issue.
To me it seems like the role is not being attached to the task. Is there an important step I'm missing? I can't SSH into the instance as it's Fargate.
UPDATE:
I extracted the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables that are set and used them on my local machine. I am getting the Access Denied issues there too, implying (to me) that none of the polices I have set for that role are being applied to the task.
Anyone that can help with anything is appreciated!
WORKAROUND:
A simple workaround is to create an IAM User with programmatic access and set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables in your task definition.
This works, but does not explain the underlying issue.
I've just had a similar issue and I think it's probably due to your program being unable to access the role's credentials that are exposed by the Instance Metadata service.
Specifically, there's an environment variable called AWS_CONTAINER_CREDENTIALS_RELATIVE_URI and its value is what's needed by the AWS SDKs to use the task role. The ECS Container Agent sets it when your task starts, and it is exposed to the container's main process that has process ID 1. If your program isn't running as such, it might not being seeing the env var and so explaining the access denied error.
Depending on how your program is running, there'll be different ways to share the env var.
I had the issue inside ssh login shells (BTW you can ssh into Fargate tasks by running sshd) so in my Docker entrypoint script I inserted somewhere:
# To share the env var with login shells
echo "export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" >> /root/.profile
In other cases it might work to add to your Docker entrypoint script:
# To export the env var for use by child processes
export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
References:
IAM Roles for Tasks - docs explaining the env var relating to the role
AWS Forum post - where someone explains these workarounds in greater detail
Amazon ECS container credentials– loaded from the Amazon ECS if the environment variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is set.
You define the IAM role to use in your task definitions, or you can use a taskRoleArn override when running a task manually with the RunTask API operation. The Amazon ECS agent receives a payload message for starting the task with additional fields that contain the role credentials. The Amazon ECS agent sets a unique task credential ID as an identification token and updates its internal credential cache so that the identification token for the task points to the role credentials that are received in the payload. The Amazon ECS agent populates the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable in the Env object (available with the docker inspect container_id command) for all containers that belong to this task with the following relative URI: /credential_provider_version/credentials?id=task_credential_id.
Terraform code:
resource "aws_iam_role" "AmazonS3ServiceForECSTask" {
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": [
"ecs-tasks.amazonaws.com"
]
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
data "aws_iam_policy_document" "bucket_policy" {
statement {
principals {
type = "AWS"
identifiers = [aws_iam_role.AmazonS3ServiceForECSTask.arn]
}
actions = [
"s3:ListBucket",
"s3:GetBucketLocation",
]
resources = [
"arn:aws:s3:::${var.bucket_name}",
]
}
statement {
principals {
type = "AWS"
identifiers = [aws_iam_role.AmazonS3ServiceForECSTask.arn]
}
actions = [
"s3:GetObject",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
]
resources = [
"arn:aws:s3:::${var.bucket_name}/*",
]
}
}
resource "aws_ecs_task_definition" "my_app_ecs_task_definition" {
task_role_arn = aws_iam_role.AmazonS3ServiceForECSTask.arn
execution_role_arn = aws_iam_role.ECS-TaskExecution.arn
family = "${var.family}"
network_mode = var.network_mode[var.launch_type]
requires_compatibilities = var.requires_compatibilities
cpu = var.task_cpu[terraform.workspace]
memory = var.task_memory[terraform.workspace]
container_definitions = module.ecs-container-definition.json
}

connecting to aws elastic search with nodejs aws sdk

what is the best approach in using aws elastic search with nodejs? I am using aws ecs ec2 instance for running my docker containers and is using the IAM role to accessing the other aws resource like S3 bucket and dynamodb from nodejs.
Can we use the same procedure for accessing the aws elastic search endpoint too?
I added an inline policy with the existing role and added the elastic search end point arn. but the nodejs sdk is not able to connect to the ES. when the aws key and id is added as environment variable in task definition it starts working. But I dont need to use that method as it will conflict with the other aws resource. (looks like the dev team is configured the program such that it looks for env)
It for sure is not the best method but you can also use a ip based restriction. We currently use this and it works fine. Just set an elastic ip on your ec2 instance (if you haven't already) and set the ip address in the access policy like this:
"Condition": {
"IpAddress": {
"aws:SourceIp": [
"XXX.XXX.XXX.XXX",
]
}
}
For anybody else stumbling across this, here's a few things I learnt whilst I was stuck on something similar:
EC2's role ARN can be added in the access policy for your Elasticsearch domain along with the permissions you want the role to have. For eg. for an EC2 running with role "aws-ec2" needing permissions to make HTTP GET requests to ES, you could have the following in your ES domain access policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<ACCOUNT_ID>:role/aws-ec2",
]
},
"Action": "es:ESHttpGet",
"Resource": "arn:aws:es:<REGION>:<ACCOUNT_ID>:domain/<DOMAIN_NAME>/*"
}
]
}
Any requests made by an EC2 instance running with role "aws-ec2" in your account will have access to elasticsearch.
Note that if you have trouble getting credentials, try the following:
AWS.config.getCredentials(function(err) {
if (err) console.log(err.stack);
// credentials not loaded
else {
// credentials are loaded and can be accessed using
AWS.config.credentials.accessKeyId, AWS.config.credentials.secretAccessKeyId etc.
}
});
This will usually pull the credentials in like magic, I have a theory about how it works (tl:dr; I think it pulls them from the EC2 instance metadata by making a request to a fixed IP) but it's unproven so I won't embarrass myself until I know more. Note that this should work even if you don't have credentials stored in your environment or in the shared credentials file.

AWS Docker deployment

I have a custom docker image uploaded to ECS. I opened up the permissions to try and get through this issue (I will lock it down again once I can get this to work). I am attempting to deploy the docker image to elastic beanstalk. I have a docker enabled elastic beanstalk environment set up. According to the AWS docs, if I am pulling my image from within AWS, I don't need to pass in credentials. So I upload my Dockerrun.aws.json file and attempt to install it. It fails with the error:
Command failed on instance. Return code: 1 Output: Failed to authenticate with ECR for registry '434875166128' in 'us-east-1'. Hook /opt/elasticbeanstalk/hooks/appdeploy/pre/03build.sh failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
The /var/log/eb-activity.log information has nothing useful in it.
Here's my Dockerrun.aws.json file:
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "{id000xxxx}.dkr.ecr.us-east-1.amazonaws.com/my-repo:1.0.0",
"Update": "true"
},
"Ports": [
{
"ContainerPort": "4000"
}
],
"Logging": "/var/log/app-name"
}
I have also tried adding the authentication with the dockercfg.json file in S3. It didn't work for me either.
Note that I am using a business account instead of a personal account, so there may be some unknown variances as well.
Thanks!
Update: My user has full permissions at the moment too, so there shouldn't be anything permission-wise getting in the way.
I was having the same problem.
Solution:
In AWS -> IAM -> Roles - > pick the role your beanstalk is using.
In my case it was set to aws-elasticbeanstalk-ec2-role
Under Permissions for the role, attach policy: AmazonEC2ContainerRegistryReadOnly
In ECR there is no need to give any permissions to this role.
Assuming
You are using Terraform to provision your infrastructure
You have created a sample ElasticBeanstalk app at least once, so that you have the default role created.
The default ElasticBeanstalk role is named: aws-elasticbeanstalk-ec2-role
Then you can comfortably use the following format to add ECR Read Only policy to the role:
data "aws_iam_role" "elastic_beanstalk_role" {
name = "aws-elasticbeanstalk-ec2-role"
}
resource "aws_iam_policy" "ebs_ecr_policy" {
name = "aws-elasticbeanstalk-ec2-ecr-policy"
description = "Enable elastic-beanstalk to be able to access ECR repository with images"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage",
"ecr:GetLifecyclePolicy",
"ecr:GetLifecyclePolicyPreview",
"ecr:ListTagsForResource",
"ecr:DescribeImageScanFindings"
],
"Resource": "*"
}
]
}
EOF
}
resource "aws_iam_policy_attachment" "ebs_ecr-policy-attach" {
name = "ebs-ecr-policy-attachment"
roles = [data.aws_iam_role.elastic_beanstalk_role.name]
policy_arn = aws_iam_policy.ebs_ecr_policy.arn
}
This way you can manage updates to the role and policy from your infrastructure code.
You can intialize necessary service roles for elastic beanstalk (aws-elasticbeanstalk-ec2-role , aws-elasticbeanstalk-service-role , AWSServiceRoleForECS ) by using the new console of Elastic Beanstalk.
You have to do this only one time on each AWS account :
Go to the Elastic beanstalk console.
Accept the "new design" : in the top of the console, if see a message "we re testing a new design", optin to accept to use the new version of the console. Warning, it seems you cant rollback to the old console.
Start the Create New Application wizard, and use a default sample application in the technology.
Complete all the step of the wizard until the resume, and look at the Security pannel : you will see the two roles "aws-elasticbeanstalk-ec2-role" and "aws-elasticbeanstalk-service-role". And terminate the wizard to create the sample app.
After a while, the application should be running
In case of emergency, go to the IAM console and delete the roles aws-elasticbeanstalk-ec2-role and aws-elasticbeanstalk-service-role and run the wizard again.
I fixed the "Command failed on instance. Return code: 1 Output: Failed to authenticate with ECR for registry" and an other strange error ("The AWS Access Key Id you provided does not exist in our records. (ElasticBeanstalk::ManifestDownloadError)") by using the NEW console. I still had this error with the old one.

Error registering: NoCredentialProviders: no valid providers in chain ECS agent error

Im trying to use EC2 Container service. Im using terraform for creating it.
I have defined a ecs cluster, autoscaling group, launch configuration. All seems to work. Except one thing. The ec2 instances are creating, but they are not register in the cluster, cluster just says no instances available.
In ecs agent log on created instance i found logs flooded with one error:
Error registering: NoCredentialProviders: no valid providers in chain
The ec2 instances are created with a proper role ecs_role. This role has two policies, one of them is following, like docs required:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:CreateCluster",
"ecs:DeregisterContainerInstance",
"ecs:DiscoverPollEndpoint",
"ecs:Poll",
"ecs:RegisterContainerInstance",
"ecs:StartTelemetrySession",
"ecs:Submit*",
"ecs:StartTask"
],
"Resource": "*"
}
]
}
Im using ami ami-6ff4bd05. Latest terraform.
It was a problem with trust relationships in the role as the role should include ec2. Unfortunately the error message was not all that helpful.
Example of trust relationship:
{
"Version": "2008-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": ["ecs.amazonaws.com", "ec2.amazonaws.com"]
},
"Effect": "Allow"
}
]
}
Make sure you select the correct ECS role in the launch configuration.
You might want to add AmazonEC2RoleforSSM (or AmazonSSMFullAccess) to your EC2's role.
apparently this error message also occurs when an invalid aws-profile is passed.
I spent 2 days trying out everything without any luck. I have a standard setup i.e. ecs cluster instance in private subnet, ELB in public subnet, NAT and IGW properly set up in respective security groups, IAM role properly defined, standard config in NACL, etc. Despite everything the ec2 instances wouldnt register with the ecs cluster. Finally I figured out that my custom VPC's DHCP Options Set was configured for 'domain-name-servers: xx.xx.xx.xx, xx.xx.xx.xx' IP address of my org's internal DNS IPs...
The solution is to have following values for the DHCP Options Set:
Domain Name: us-west-2.compute.internal (assuming your vpc is in us-west-2),
Options: domain-name: us-west-2.compute.internal
domain-name-servers: AmazonProvidedDNS
I got this error today and figured out the problem: I missed setting the IAM role in launch template (it is under Advanced section). You need to set it to ecsInstanceRole (this is the default name AWS gives - so check if you have changed it and use accordingly).
I had switched from Launch Configuration to Launch Template, and while setting up the Launch Template, I missed adding the role!
if you use taskDefinition , check that you set execution & taskRole ARN's and set correct policies for that roles.