I am trying to send data from Amazon Kinesis Data Firehose to Amazon Elasticsearch Service, but it's logging an error saying 503 Service Unavailable. However, I can reach the Elasticsearch endpoint (https://vpc-XXX.<region>.es.amazonaws.com) and make queries on it. I also went through How can I prevent HTTP 503 Service Unavailable errors in Amazon Elasticsearch Service? and can confirm my setup have enough resources.
Here's the error I get in my S3 backup bucket that holds the failed logs:
{
"attemptsMade": 8,
"arrivalTimestamp": 1599748282943,
"errorCode": "ES.ServiceException",
"errorMessage": "Error received from Elasticsearch cluster. <html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>",
"attemptEndingTimestamp": 1599748643460,
"rawData": "eyJ0aWNrZXJfc3ltYm9sIjoiQUxZIiwic2VjdG9yIjoiRU5FUkdZIiwiY2hhbmdlIjotNi4zNSwicHJpY2UiOjg4LjgzfQ==",
"subsequenceNumber": 0,
"esDocumentId": "49610662085822146490768158474738345331794592496281976834.0",
"esIndexName": "prometheus-2020-09",
"esTypeName": ""
},
Anyone have any ideas how to fix this and have the data indexed into Elasticsearch?
Turns out, my issue was with selecting the wrong security group.
I was using the same security group (I named it elasticsearch-${domain_name}) as attached to the Elasticsearch instance (which allowed TCP ingress/egress to/from port 443 from the firehose_es security group). I should have selected the firehose_es security group instead.
As requested in the comment, here's the Terraform configuration for the firehose_es SG.
resource "aws_security_group" "firehose_es" {
name = "firehose_es"
description = "Firehose to send logs to Elasticsearch"
vpc_id = module.networking.aws_vpc_id
}
resource "aws_security_group_rule" "firehose_es_https_ingress" {
type = "ingress"
from_port = 443
to_port = 443
protocol = "tcp"
security_group_id = aws_security_group.firehose_es.id
cidr_blocks = ["10.0.0.0/8"]
}
resource "aws_security_group_rule" "firehose_es_https_egress" {
type = "egress"
from_port = 443
to_port = 443
protocol = "tcp"
security_group_id = aws_security_group.firehose_es.id
source_security_group_id = aws_security_group.elasticsearch.id
}
Another thing which I fixed prior to asking this question (which may be why some of you are reaching this question) is to use the right role and attach the right policy to the role. Here's my role (as Terraform config)
// https://docs.aws.amazon.com/firehose/latest/dev/controlling-access.html
data "aws_iam_policy_document" "firehose_es_policy_specific" {
statement {
actions = [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject"
]
resources = [
aws_s3_bucket.firehose.arn,
"${aws_s3_bucket.firehose.arn}/*"
]
}
statement {
actions = [
"es:DescribeElasticsearchDomain",
"es:DescribeElasticsearchDomains",
"es:DescribeElasticsearchDomainConfig",
"es:ESHttpPost",
"es:ESHttpPut"
]
resources = [
var.elasticsearch_domain_arn,
"${var.elasticsearch_domain_arn}/*",
]
}
statement {
actions = [
"es:ESHttpGet"
]
resources = [
"${var.elasticsearch_domain_arn}/_all/_settings",
"${var.elasticsearch_domain_arn}/_cluster/stats",
"${var.elasticsearch_domain_arn}/${var.name_prefix}${var.name}_${var.app}*/_mapping/type-name",
"${var.elasticsearch_domain_arn}/_nodes",
"${var.elasticsearch_domain_arn}/_nodes/stats",
"${var.elasticsearch_domain_arn}/_nodes/*/stats",
"${var.elasticsearch_domain_arn}/_stats",
"${var.elasticsearch_domain_arn}/${var.name_prefix}${var.name}_${var.app}*/_stats"
]
}
statement {
actions = [
"ec2:DescribeVpcs",
"ec2:DescribeVpcAttribute",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"ec2:DescribeNetworkInterfaces",
"ec2:CreateNetworkInterface",
"ec2:CreateNetworkInterfacePermission",
"ec2:DeleteNetworkInterface",
]
resources = [
"*"
]
}
}
resource "aws_kinesis_firehose_delivery_stream" "ecs" {
name = "${var.name_prefix}${var.name}_${var.app}"
destination = "elasticsearch"
s3_configuration {
role_arn = aws_iam_role.firehose_es.arn
bucket_arn = aws_s3_bucket.firehose.arn
buffer_interval = 60
compression_format = "GZIP"
}
elasticsearch_configuration {
domain_arn = var.elasticsearch_domain_arn
role_arn = aws_iam_role.firehose_es.arn
# If Firehose cannot deliver to Elasticsearch, logs are sent to S3
s3_backup_mode = "FailedDocumentsOnly"
buffering_interval = 60
buffering_size = 5
index_name = "${var.name_prefix}${var.name}_${var.app}"
index_rotation_period = "OneMonth"
vpc_config {
subnet_ids = var.elasticsearch_subnet_ids
security_group_ids = [var.firehose_security_group_id]
role_arn = aws_iam_role.firehose_es.arn
}
}
}
I was able to figure our my mistake after reading through the Controlling Access with Amazon Kinesis Data Firehose article again.
Related
I'm fairly new to AWS. I am trying to deploy a docker container to ECS but it fails with the following error:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.us-east-1.amazonaws.com/": dial tcp 52.46.146.144:443: i/o timeout
This was working perfectly fine, until I tried to add a loadbalancer, at which point this error began occuring. I must have changed something but I'm not sure what.
The ECS instance is in a public subnet
The security group has in/out access on all ports/ips (0.0.0.0/0)
The VPC has an internet gateway
Clearly something is wrong with my config but I'm not sure what. Google and other stack overflows haven't helped so far.
Terraform ECS file:
resource "aws_ecs_cluster" "solmines-ecs-cluster" {
name = "solmines-ecs-cluster"
}
resource "aws_ecs_service" "solmines-ecs-service" {
name = "solmines"
cluster = aws_ecs_cluster.solmines-ecs-cluster.id
task_definition = aws_ecs_task_definition.solmines-ecs-task-definition.arn
launch_type = "FARGATE"
desired_count = 1
network_configuration {
security_groups = [aws_security_group.solmines-ecs.id]
subnets = ["${aws_subnet.solmines-public-subnet1.id}", "${aws_subnet.solmines-public-subnet2.id}"]
assign_public_ip = true
}
load_balancer {
target_group_arn = aws_lb_target_group.solmines-lb-tg.arn
container_name = "solmines-api"
container_port = 80
}
depends_on = [aws_lb_listener.solmines-lb-listener]
}
resource "aws_ecs_task_definition" "solmines-ecs-task-definition" {
family = "solmines-ecs-task-definition"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
memory = "1024"
cpu = "512"
execution_role_arn = "${aws_iam_role.solmines-ecs-role.arn}"
container_definitions = <<EOF
[
{
"name": "solmines-api",
"image": "${aws_ecr_repository.solmines-ecr-repository.repository_url}:latest",
"memory": 1024,
"cpu": 512,
"essential": true,
"portMappings": [
{
"containerPort": 80,
"hostPort": 80
}
]
}
]
EOF
}
We have deployed a DMS replication task to replicate our entire Postgres database to Redshift. The tables are getting created with the correct schemas, but the data isn't coming through to Redshift and getting held up in the S3 bucket DMS uses as an intermediary step. This is all deployed via Terraform.
We've configured the IAM roles as described in the replication instance Terraform docs with all three of dms-access-for-endpoint, dms-cloudwatch-logs-role, and dms-vpc-role IAM roles created. The IAM roles are deployed via a different stack to where DMS is deployed from as the roles are used by another, successfully deployed, DMS instance running a different task.
data "aws_iam_policy_document" "dms_assume_role_document" {
statement {
actions = ["sts:AssumeRole"]
principals {
identifiers = [
"s3.amazonaws.com",
"iam.amazonaws.com",
"redshift.amazonaws.com",
"dms.amazonaws.com",
"redshift-serverless.amazonaws.com"
]
type = "Service"
}
}
}
# Database Migration Service requires the below IAM Roles to be created before
# replication instances can be created. See the DMS Documentation for
# additional information: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Security.html#CHAP_Security.APIRole
# * dms-vpc-role
# * dms-cloudwatch-logs-role
# * dms-access-for-endpoint
resource "aws_iam_role" "dms_access_for_endpoint" {
name = "dms-access-for-endpoint"
assume_role_policy = data.aws_iam_policy_document.dms_assume_role_document.json
managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonDMSRedshiftS3Role"]
force_detach_policies = true
}
resource "aws_iam_role" "dms_cloudwatch_logs_role" {
name = "dms-cloudwatch-logs-role"
description = "Allow DMS to manage CloudWatch logs."
assume_role_policy = data.aws_iam_policy_document.dms_assume_role_document.json
managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonDMSCloudWatchLogsRole"]
force_detach_policies = true
}
resource "aws_iam_role" "dms_vpc_role" {
name = "dms-vpc-role"
description = "DMS IAM role for VPC permissions"
assume_role_policy = data.aws_iam_policy_document.dms_assume_role_document.json
managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonDMSVPCManagementRole"]
force_detach_policies = true
}
However, on runtime, we're seeing the following logs in CloudWatch:
2022-09-01T16:51:38 [SOURCE_UNLOAD ]E: Not retriable error: <AccessDenied> Access Denied [1001705] (anw_retry_strategy.cpp:118)
2022-09-01T16:51:38 [SOURCE_UNLOAD ]E: Failed to list bucket 'dms-sandbox-redshift-intermediate-storage': error code <AccessDenied>: Access Denied [1001713] (s3_dir_actions.cpp:105)
2022-09-01T16:51:38 [SOURCE_UNLOAD ]E: Failed to list bucket 'dms-sandbox-redshift-intermediate-storage' [1001713] (s3_dir_actions.cpp:209)
We also enabled S3 server access logs on the bucket itself to see whether this would give us more information. This is what we're seeing (anonymised):
<id> dms-sandbox-redshift-intermediate-storage [01/Sep/2022:15:43:32 +0000] 10.128.69.80 arn:aws:sts::<account>:assumed-role/dms-access-for-endpoint/dms-session-for-replication-engine <code> REST.GET.BUCKET - "GET /dms-sandbox-redshift-intermediate-storage?delimiter=%2F&max-keys=1000 HTTP/1.1" 403 AccessDenied 243 - 30 - "-" "aws-sdk-cpp/1.8.80/S3/Linux/4.14.276-211.499.amzn2.x86_64 x86_64 GCC/4.9.3" - <code> SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader s3.eu-west-2.amazonaws.com TLSv1.2 -
The above suggests that a service dms-session-for-replication is the service in question that is receiving the AccessDenied responses, but we're unable to pinpoint what this is and how we can fix it.
We attempted to add a bucket policy to the S3 bucket itself but this did not work (this also includes the S3 server access logs bucket):
resource "aws_s3_bucket" "dms_redshift_intermediate" {
# Prefixed with `dms-` as that's what the AmazonDMSRedshiftS3Role policy filters on
bucket = "dms-sandbox-redshift-intermediate-storage"
}
resource "aws_s3_bucket_logging" "log_bucket" {
bucket = aws_s3_bucket.dms_redshift_intermediate.id
target_bucket = aws_s3_bucket.log_bucket.id
target_prefix = "log/"
}
resource "aws_s3_bucket" "log_bucket" {
bucket = "${aws_s3_bucket.dms_redshift_intermediate.id}-logs"
}
resource "aws_s3_bucket_acl" "log_bucket" {
bucket = aws_s3_bucket.log_bucket.id
acl = "log-delivery-write"
}
resource "aws_s3_bucket_policy" "dms_redshift_intermediate_policy" {
bucket = aws_s3_bucket.dms_redshift_intermediate.id
policy = data.aws_iam_policy_document.dms_redshift_intermediate_policy_document.json
}
data "aws_iam_policy_document" "dms_redshift_intermediate_policy_document" {
statement {
actions = [
"s3:*"
]
principals {
identifiers = [
"dms.amazonaws.com",
"redshift.amazonaws.com"
]
type = "Service"
}
resources = [
aws_s3_bucket.dms_redshift_intermediate.arn,
"${aws_s3_bucket.dms_redshift_intermediate.arn}/*"
]
}
}
How do we fix the <AccessDenied> issues that we're seeing on CloudWatch and enable data loading to Redshift? DMS is able to PUT items in the S3 bucket as we're seeing encrypted CSVs appearing in there (the server access logs also confirm this), but DMS is unable to then GET the files back out of it for Redshift. The AccessDenied responses also suggest that it's an IAM role issue not a security group issue but our IAM roles are configured as per the docs so we're confused as to what could be causing this issue.
What we thought was an IAM issue, was actually a security group issue. The COPY command for Redshift was struggling to access S3. By adding a 443 egress rule for HTTPS to the Redshift security group, we were able to pull data through again
resource "aws_security_group_rule" "https_443_egress" {
type = "egress"
description = "Allow HTTP egress from DMS SG"
protocol = "tcp"
to_port = 443
from_port = 443
security_group_id = aws_security_group.redshift.id
cidr_blocks = ["0.0.0.0/0"]
}
So if you're experiencing the same issue as the question, check whether Redshift has access to S3 via HTTPS.
You are right this is an IAM role issue, make sure the role in questions has the following statements added to the policy document,
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource":"arn:aws:s3:::<yourbucketnamehere>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource":"arn:aws:s3:::<yourbucketnamehere>"
},
{
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::*"
}
TL;DR: Does my EC2 instance need an IAM role to be added to my ECS cluster? If so, how do I set that?
I have an EC2 instance created using an autoscaling group. (ASG definition here.) I also have an ECS cluster, which is set on the spawned instances via user_data. I've confirmed that /etc/ecs/ecs.config on the running instance looks correct:
ECS_CLUSTER=my_cluster
However, the instance never appears in the cluster, so the service task doesn't run. There are tons of questions on SO about this, and I've been through them all. The instances are in a public subnet and have access to the internet. The error in ecs-agent.log is:
Error getting ECS instance credentials from default chain: NoCredentialProviders: no valid providers in chain.
So I am guessing that the problem is that the instance has no IAM role associated with it. But I confess that I am a bit confused about all the various "roles" and "services" involved. Does this look like a problem?
If that's it, where do I set this? I'm using Cloud Posse modules. The docs say I shouldn't set a service_role_arn on a service task if I'm using "awsvpc" as the networking mode, but I am not sure whether I should be using a different mode for this setup (multiple containers running as tasks on a single EC2 instance). Also, there are several other roles I can configure here? The ECS service task looks like this:
module "ecs_alb_service_task" {
source = "cloudposse/ecs-alb-service-task/aws"
# Cloud Posse recommends pinning every module to a specific version
version = "0.62.0"
container_definition_json = jsonencode([for k, def in module.flask_container_def : def.json_map_object])
name = "myapp-web"
security_group_ids = [module.sg.id]
ecs_cluster_arn = aws_ecs_cluster.default.arn
task_exec_role_arn = [aws_iam_role.ec2_task_execution_role.arn]
launch_type = "EC2"
alb_security_group = module.sg.name
vpc_id = module.vpc.vpc_id
subnet_ids = module.subnets.public_subnet_ids
network_mode = "awsvpc"
desired_count = 1
task_memory = (512 * 3)
task_cpu = 1024
deployment_controller_type = "ECS"
enable_all_egress_rule = false
health_check_grace_period_seconds = 10
deployment_minimum_healthy_percent = 50
deployment_maximum_percent = 200
ecs_load_balancers = [{
container_name = "web"
container_port = 80
elb_name = null
target_group_arn = module.alb.default_target_group_arn
}]
}
And here's the policy for the ec2_task_execution_role:
data "aws_iam_policy_document" "ec2_task_execution_role" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
Update: Here is the rest of the declaration of the task execution role:
resource "aws_iam_role" "ec2_task_execution_role" {
name = "${var.project_name}_ec2_task_execution_role"
assume_role_policy = data.aws_iam_policy_document.ec2_task_execution_role.json
tags = {
Name = "${var.project_name}_ec2_task_execution_role"
Project = var.project_name
}
}
resource "aws_iam_role_policy_attachment" "ec2_task_execution_role" {
role = aws_iam_role.ec2_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# Create a policy for the EC2 role to use Session Manager
resource "aws_iam_role_policy" "ec2_role_policy" {
name = "${var.project_name}_ec2_role_policy"
role = aws_iam_role.ec2_task_execution_role.id
policy = jsonencode({
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Action" : [
"ssm:DescribeParameters",
"ssm:GetParametersByPath",
"ssm:GetParameters",
"ssm:GetParameter"
],
"Resource" : "*"
}
]
})
}
Update 2: The EC2 instances are created by the Auto-Scaling Group, see here for my code. The ECS cluster is just this:
# Create the ECS cluster
resource "aws_ecs_cluster" "default" {
name = "${var.project_name}_cluster"
tags = {
Name = "${var.project_name}_cluster"
Project = var.project_name
}
}
I was expecting there to be something like instance_role in the ec2-autoscaling-group module, but there isn't.
You need to set the EC2 instance profile (IAM instance role) via the iam_instance_profile_name setting in the module "autoscale_group".
Friends,
I am completely new to Terraform but I am trying learn here. At the moment I am reading the book Terraform UP and Running but I need to spin up an EKS cluster to deploy one of my learning projects. For this, I am following this [tutorial][1] of Hashicorp.
My main questions are the following: Do I really need all of this (see the terraform code for aws bellow) to deploy a cluster on AWS? How could I reduce the bellow code to the minimum necessary to spin up a cluster with a master and one worker which are able to communicate with each other?
On Gloud I could spin up a cluster with just these few lines of code:
provider "google" {
credentials = file(var.credentials)
project = var.project
region = var.region
}
resource "google_container_cluster" "primary" {
name = var.cluster_name
network = var.network
location = var.region
initial_node_count = var.initial_node_count
}
resource "google_container_node_pool" "primary_preemtible_nodes" {
name = var.node_name
location = var.region
cluster = google_container_cluster.primary.name
node_count = var.node_count
node_config {
preemptible = var.preemptible
machine_type = var.machine_type
}
}
Can I do something similar do spin up an EKS cluster? The code bellow is working but I feel like I am biting more than I can chew.
provider "aws" {
region = "${var.AWS_REGION}"
secret_key = "${var.AWS_SECRET_KEY}"
access_key = "${var.AWS_ACCESS_KEY}"
}
# ----- Base VPC Networking -----
data "aws_availability_zones" "available_zones" {}
# Creates a virtual private network which will isolate
# the resources to be created.
resource "aws_vpc" "blur-vpc" {
#Specifies the range of IP adresses for the VPC.
cidr_block = "10.0.0.0/16"
tags = "${
map(
"Name", "terraform-eks-node",
"kubernetes.io/cluster/${var.cluster-name}", "shared"
)
}"
}
resource "aws_subnet" "subnet" {
count = 2
availability_zone = "${data.aws_availability_zones.available_zones.names[count.index]}"
cidr_block = "10.0.${count.index}.0/24"
vpc_id = "${aws_vpc.blur-vpc.id}"
tags = "${
map(
"Name", "blur-subnet",
"kubernetes.io/cluster/${var.cluster-name}", "shared",
)
}"
}
# The component that allows communication between
# the VPC and the internet.
resource "aws_internet_gateway" "gateway" {
# Attaches the gateway to the VPC.
vpc_id = "${aws_vpc.blur-vpc.id}"
tags = {
Name = "eks-gateway"
}
}
# Determines where network traffic from the gateway
# will be directed.
resource "aws_route_table" "route-table" {
vpc_id = "${aws_vpc.blur-vpc.id}"
route {
cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.gateway.id}"
}
}
resource "aws_route_table_association" "table_association" {
count = 2
subnet_id = "${aws_subnet.subnet.*.id[count.index]}"
route_table_id = "${aws_route_table.route-table.id}"
}
# -- Resources required for the master setup --
# This bellow block (IAM role + Policy) allows the EKS service to
# manage or retrieve data from other AWS services.
# Similar to a IAM but not uniquely associated with one person.
# A role can be assumed by anyone who needs it.
resource "aws_iam_role" "blur-iam-role" {
name = "eks-cluster"
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
POLICY
}
# Attaches the policy "AmazonEKSClusterPolicy" to the role created above.
resource "aws_iam_role_policy_attachment" "blur-iam-role-AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = "${aws_iam_role.blur-iam-role.name}"
}
# Master security group
# # A security group acts as a virtual firewall to control inbound and outbound traffic.
# This security group will control networking access to the K8S master.
resource "aws_security_group" "blur-cluster" {
name = "eks-blur-cluster"
description = "Allows the communucation with the worker nodes"
vpc_id = "${aws_vpc.blur-vpc.id}"
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "blur-cluster"
}
}
# The actual master node
resource "aws_eks_cluster" "blur-cluster" {
name = "${var.cluster-name}"
# Attaches the IAM role created above.
role_arn = "${aws_iam_role.blur-iam-role.arn}"
vpc_config {
# Attaches the security group created for the master.
# Attaches also the subnets.
security_group_ids = ["${aws_security_group.blur-cluster.id}"]
subnet_ids = "${aws_subnet.subnet.*.id}"
}
depends_on = [
"aws_iam_role_policy_attachment.blur-iam-role-AmazonEKSClusterPolicy",
# "aws_iam_role_policy_attachment.blur-iam-role-AmazonEKSServicePolicy"
]
}
# -- Resources required for the worker nodes setup --
# IAM role for the workers. Allows worker nodes to manage or retrieve data
# from other services and its required for the workers to join the cluster.
resource "aws_iam_role" "iam-role-worker"{
name = "eks-worker"
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
POLICY
}
# allows Amazon EKS worker nodes to connect to Amazon EKS Clusters.
resource "aws_iam_role_policy_attachment" "iam-role-worker-AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = "${aws_iam_role.iam-role-worker.name}"
}
# This permission is required to modify the IP address configuration of worker nodes
resource "aws_iam_role_policy_attachment" "iam-role-worker-AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = "${aws_iam_role.iam-role-worker.name}"
}
# Allows to list repositories and pull images
resource "aws_iam_role_policy_attachment" "iam-role-worker-AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = "${aws_iam_role.iam-role-worker.name}"
}
# An instance profile represents an EC2 instances (Who am I?)
# and assumes a role (what can I do?).
resource "aws_iam_instance_profile" "worker-node" {
name = "worker-node"
role = "${aws_iam_role.iam-role-worker.name}"
}
# Security group for the worker nodes
resource "aws_security_group" "security-group-worker" {
name = "worker-node"
description = "Security group for worker nodes"
vpc_id = "${aws_vpc.blur-vpc.id}"
egress {
cidr_blocks = [ "0.0.0.0/0" ]
from_port = 0
to_port = 0
protocol = "-1"
}
tags = "${
map(
"Name", "blur-cluster",
"kubernetes.io/cluster/${var.cluster-name}", "owned"
)
}"
}
resource "aws_security_group_rule" "ingress-self" {
description = "Allow communication among nodes"
from_port = 0
to_port = 65535
protocol = "-1"
security_group_id = "${aws_security_group.security-group-worker.id}"
source_security_group_id = "${aws_security_group.security-group-worker.id}"
type = "ingress"
}
resource "aws_security_group_rule" "ingress-cluster-https" {
description = "Allow worker to receive communication from the cluster control plane"
from_port = 443
to_port = 443
protocol = "tcp"
security_group_id = "${aws_security_group.security-group-worker.id}"
source_security_group_id = "${aws_security_group.blur-cluster.id}"
type = "ingress"
}
resource "aws_security_group_rule" "ingress-cluster-others" {
description = "Allow worker to receive communication from the cluster control plane"
from_port = 1025
to_port = 65535
protocol = "tcp"
security_group_id = "${aws_security_group.security-group-worker.id}"
source_security_group_id = "${aws_security_group.blur-cluster.id}"
type = "ingress"
}
# Worker Access to Master
resource "aws_security_group_rule" "cluster-node-ingress-http" {
description = "Allows pods to communicate with the cluster API server"
from_port = 443
to_port = "443"
protocol = "tcp"
security_group_id = "${aws_security_group.blur-cluster.id}"
source_security_group_id = "${aws_security_group.security-group-worker.id}"
type = "ingress"
}
# --- Worker autoscaling group ---
# This data will be used to filter and select an AMI which is compatible with the specific k8s version being deployed
data "aws_ami" "eks-worker" {
filter {
name = "name"
values = ["amazon-eks-node-${aws_eks_cluster.blur-cluster.version}-v*"]
}
most_recent = true
owners = ["602401143452"]
}
data "aws_region" "current" {}
locals {
node-user-data =<<USERDATA
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh --apiserver-endpoint '${aws_eks_cluster.blur-cluster.endpoint}'
USERDATA
}
# To spin up an auto scaling group an "aws_launch_configuration" is needed.
# This ALC requires an "image_id" as well as a "security_group".
resource "aws_launch_configuration" "launch_config" {
associate_public_ip_address = true
iam_instance_profile = "${aws_iam_instance_profile.worker-node.name}"
image_id = "${data.aws_ami.eks-worker.id}"
instance_type = "t2.micro"
name_prefix = "terraform-eks"
security_groups = ["${aws_security_group.security-group-worker.id}"]
user_data_base64 = "${base64encode(local.node-user-data)}"
lifecycle {
create_before_destroy = true
}
}
# Actual autoscaling group
resource "aws_autoscaling_group" "autoscaling" {
desired_capacity = 2
launch_configuration = "${aws_launch_configuration.launch_config.id}"
max_size = 2
min_size = 1
name = "terraform-eks"
vpc_zone_identifier = "${aws_subnet.subnet.*.id}"
tag {
key = "Name"
value = "terraform-eks"
propagate_at_launch = true
}
# "kubernetes.io/cluster/*" tag allows EKS and K8S to discover and manage compute resources.
tag {
key = "kubernetes.io/cluster/${var.cluster-name}"
value = "owned"
propagate_at_launch = true
}
}
[1]: https://registry.terraform.io/providers/hashicorp/aws/2.33.0/docs/guides/eks-getting-started#preparation
Yes, you should create most of them, because as you can see at Terraform AWS documents, VPC configuration is required to deploy EKS cluster. But you don't have to set up a security group rule for workers to access the master. Also, try to use aws_eks_node_group resource to create worker nodegroup. It will save you from creating launch configuration and autoscaling group seperately.
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group
I have a service running on ECS deployed with Fargate. I am using ecs-cli compose to launch this service. Here is the command I currently use:
ecs-cli compose service up --cluster my_cluster —-launch-type FARGATE
I also have an ecs-params.yml to configure this service. Here is the content:
version: 1
task_definition:
task_execution_role: ecsTaskExecutionRole
task_role_arn: arn:aws:iam::XXXXXX:role/MyExecutionRole
ecs_network_mode: awsvpc
task_size:
mem_limit: 2GB
cpu_limit: 1024
run_params:
network_configuration:
awsvpc_configuration:
subnets:
- "subnet-XXXXXXXXXXXXXXXXX"
- "subnet-XXXXXXXXXXXXXXXXX"
security_groups:
- "sg-XXXXXXXXXXXXXX"
assign_public_ip: ENABLED
Once the service is created, I have to log into the AWS console and attach an auto-scaling policy through the AWS GUI. Is there an easier way to attach an auto-scaling policy, either through the CLI or in my YAML configuration?
While you can use the AWS CLI itself (see application-autoscaling in the docs),
I think it is much better for the entire operation to be performed in one deployment, and for that, you have tools such as Terraform.
You can use the terraform-ecs module written by arminc from Github, or you can do by it yourself! Here's a quick (and really dirty) example for the entire cluster, but you can also just grab the autoscaling part and use that if you don't want to have the entire deployment in one place:
provider "aws" {
region = "us-east-1" # insert your own region
profile = "insert aw cli profile, should be located in ~/.aws/credentials file"
# you can also use your aws credentials instead
# access_key = "insert_access_key"
# secret_key = "insert_secret_key"
}
resource "aws_ecs_cluster" "cluster" {
name = "my-cluster"
}
resource "aws_ecs_service" "service" {
name = "my-service"
cluster = "${aws_ecs_cluster.cluster.id}"
task_definition = "${aws_ecs_task_definition.task_definition.family}:${aws_ecs_task_definition.task_definition.revision}"
network_configuration {
# These can also be created with Terraform and applied dynamically instead of hard-coded
# look it up in the Docs
security_groups = ["SG_IDS"]
subnets = ["SUBNET_IDS"] # can also be created with Terraform
assign_public_ip = true
}
}
resource "aws_ecs_task_definition" "task_definition" {
family = "my-service"
execution_role_arn = "ecsTaskExecutionRole"
task_role_arn = "INSERT_ARN"
network_mode = "awsvpc"
container_definitions = <<DEFINITION
[
{
"name": "my_service"
"cpu": 1024,
"environment": [{
"name": "exaple_ENV_VAR",
"value": "EXAMPLE_VALUE"
}],
"essential": true,
"image": "INSERT IMAGE URL",
"memory": 2048,
"networkMode": "awsvpc"
}
]
DEFINITION
}
#
# Application AutoScaling resources
#
resource "aws_appautoscaling_target" "main" {
service_namespace = "ecs"
resource_id = "service/${var.cluster_name}/${aws_ecs_service.service.name}"
scalable_dimension = "ecs:service:DesiredCount"
# Insert Min and Max capacity here
min_capacity = "1"
max_capacity = "4"
depends_on = [
"aws_ecs_service.main",
]
}
resource "aws_appautoscaling_policy" "up" {
name = "scaling_policy-${aws_ecs_service.service.name}-up"
service_namespace = "ecs"
resource_id = "service/${aws_ecs_cluster.cluster.name}/${aws_ecs_service.service.name}"
scalable_dimension = "ecs:service:DesiredCount"
step_scaling_policy_configuration {
adjustment_type = "ChangeInCapacity"
cooldown = "60" # In seconds
metric_aggregation_type = "Average"
step_adjustment {
metric_interval_lower_bound = 0
scaling_adjustment = 1 # you can also use negative numbers for scaling down
}
}
depends_on = [
"aws_appautoscaling_target.main",
]
}
resource "aws_appautoscaling_policy" "down" {
name = "scaling_policy-${aws_ecs_service.service.name}-down"
service_namespace = "ecs"
resource_id = "service/${aws_ecs_cluster.cluster.name}/${aws_ecs_service.service.name}"
scalable_dimension = "ecs:service:DesiredCount"
step_scaling_policy_configuration {
adjustment_type = "ChangeInCapacity"
cooldown = "60" # In seconds
metric_aggregation_type = "Average"
step_adjustment {
metric_interval_upper_bound = 0
scaling_adjustment = -1 # scale down example
}
}
depends_on = [
"aws_appautoscaling_target.main",
]
}