aws_eks_node_group Max pods disabled - amazon-web-services

I am trying to create EKS cluster with maxpods limit =110
Creating node group using aws_eks_node_group
resource "aws_eks_node_group" "eks-node-group" {
cluster_name = var.cluster-name
node_group_name = var.node-group-name
node_role_arn = var.eks-nodes-role.arn
subnet_ids = var.subnet-ids
version = var.cluster-version
release_version = nonsensitive(data.aws_ssm_parameter.eks_ami_release_version.value)
capacity_type = "SPOT"
lifecycle {
create_before_destroy = true
}
scaling_config {
desired_size = var.scale-config.desired-size
max_size = var.scale-config.max-size
min_size = var.scale-config.min-size
}
instance_types = var.scale-config.instance-types
update_config {
max_unavailable = var.update-config.max-unavailable
}
depends_on = [var.depends-on]
launch_template {
id = aws_launch_template.node-group-launch-template.id
version = aws_launch_template.node-group-launch-template.latest_version
}
}
resource "aws_launch_template" "node-group-launch-template" {
name_prefix = "eks-node-group"
image_id = var.template-image-id
block_device_mappings {
device_name = "/dev/sda1"
ebs {
volume_size = var.ebs_size
}
}
ebs_optimized = true
user_data = base64encode(data.template_file.test.rendered)
# user_data = filebase64("${path.module}/example.sh")
}
data "template_file" "test" {
template = <<EOF
/etc/eks/bootstrap.sh ${var.cluster-name} --use-max-pods false --kubelet-extra-args '--max-pods=110'
EOF
}
Launch template is created just to provide bootstrap arguments. I have tried supplying the same in aws_eks_cluster resource as well
module "eks__user_data" {
source = "terraform-aws-modules/eks/aws//modules/_user_data"
version = "18.30.3"
cluster_name = aws_eks_cluster.metashape-eks.name
bootstrap_extra_args = "--use-max-pods false --kubelet-extra-args '--max-pods=110'"
}
but unable to achieve desired effect till now.
Trying to follow
https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html. CNI driver is enabled 1.12 and all other configurations seems correct too.

Related

aws_launch_template makes batch job in runnable

I added a launch template to IaC and and when I run the batch job it stuck in runnable state
this my launch template code :
resource "aws_launch_template" "conversion_launch_template" {
name = "conversion-launch-template"
block_device_mappings {
device_name = "/dev/sda1"
ebs {
volume_size = 80
}
}
iam_instance_profile {
name = "conversion-pipeline-batch-iam-instance-profile"
}
}
and in "aws_batch_compute_environment" resource I reference the launch template inside the compute ressource block :
resource "aws_batch_compute_environment" "conversion_pipeline" {
compute_environment_name = "conversion-pipeline-batch-compute-environment"
compute_resources {
instance_role = aws_iam_instance_profile.conversion_pipeline_batch.arn
instance_type = var.conversion_pipeline_instance_type
max_vcpus = var.conversion_pipeline_max_vcpus
min_vcpus = 0
security_group_ids = [
aws_security_group.conversion_pipeline_batch.id
]
subnets = var.subnets
type = "EC2"
launch_template {
launch_template_id = aws_launch_template.conversion_launch_template.id
}
}
service_role = aws_iam_role.conversion_pipeline_batch_service_role.arn
type = "MANAGED"
tags = {
environment = var.env
}
}

Unable to create an EKS Cluster with an existing security group using Terraform

I'm having issues when trying to create an EKS cluster with a few security groups that I already have created. I don't want a new SG every time I create a new EKS Cluster.
I have a problem with a part of this code under vpc_id part."cluster_create_security_group=false" produces an error, and cluster_security_group_id = "sg-123" is completely ignored.
My code is like this:
provider "aws" {
region = "us-east-2"
}
terraform {
backend "s3" {
bucket = "mys3bucket"
key = "eks/terraform.tfstate"
region = "us-east-2"
}
}
data "aws_eks_cluster" "cluster" {
name = module.eks.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks.cluster_id
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
variable "cluster_security_group_id" {
description = "Existing security group ID to be attached to the cluster. Required if `create_cluster_security_group` = `false`"
type = string
default = "sg-1234"
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 18.0"
cluster_name = "cluster-example"
cluster_version = "1.21" #This may vary depending on the purpose of the cluster
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
cluster_addons = {
coredns = {
resolve_conflicts = "OVERWRITE"
}
kube-proxy = {}
vpc-cni = {
resolve_conflicts = "OVERWRITE"
}
}
vpc_id = "vpc-12345"
subnet_ids = ["subnet-123", "subnet-456", "subnet-789"]
create_cluster_security_group=false ----------> ERROR: An argument named "cluster_create_security_group" is not expected here
cluster_security_group_id = "my-security-group-id"
# EKS Managed Node Group(s)
eks_managed_node_group_defaults = {
disk_size = 50
instance_types = ["t3.medium"]
}
eks_managed_node_groups = {
Test-Nodegroup = {
min_size = 2
max_size = 5
desired_size = 2
instance_types = ["t3.large"]
capacity_type = "SPOT"
}
}
tags = {
Environment = "dev"
Terraform = "true"
}
}
Where am I wrong? This is my whole Terraform file.

You cannot specify an AMI Type other than CUSTOM, when specifying an image id in your launch template

I'm stuck in a loop here- I'm trying to create a launch template for my eks nodes and my launch template looked like this:
resource "aws_launch_template" "node" {
image_id = var.image_id
instance_type = var.instance_type
key_name = var.key_name
instance_initiated_shutdown_behavior = "terminate"
name = var.name
user_data = base64encode("node_userdata.tpl")
# vpc_security_group_ids = var.security_group_ids
block_device_mappings {
device_name = "/dev/sda1"
ebs {
volume_size = 20
}
}
iam_instance_profile {
name = aws_iam_instance_profile.node.name
}
monitoring {
enabled = true
}
}
Here's my node resource block as well:
resource "aws_eks_node_group" "nodes_eks" {
cluster_name = aws_eks_cluster.eks.name
node_group_name = "eks-node-group"
node_role_arn = aws_iam_role.eks_nodes.arn
subnet_ids = module.vpc.private_subnets
# remote_access {
# ec2_ssh_key = aws_key_pair.bastion_auth.id
# }
scaling_config {
desired_size = 3
max_size = 6
min_size = 3
}
ami_type = "AL2_x86_64"
capacity_type = "ON_DEMAND"
force_update_version = false
instance_types = [var.instance_type]
labels = {
role = "nodes-pool-1"
}
launch_template {
id = aws_launch_template.node.id
version = "$Default"
}
# version = var.k8s_version
depends_on = [
aws_iam_role_policy_attachment.amazon_eks_worker_node_policy,
aws_iam_role_policy_attachment.amazon_eks_cni_policy,
aws_iam_role_policy_attachment.amazon_ec2_container_registry_read_only,
]
}
My image ID for my launch template is this amazon linux 2 image "ami-098e42ae54c764c35". When I tried to run that, it gave me this error
You cannot specify an AMI Type other than CUSTOM, when specifying an image id in your launch template
So I changed it from var.image_id (The Amazon Linux 2 image) to "CUSTOM" and it's returning this error now:
InvalidAMIID.Malformed: The image ID 'CUSTOM' is not valid. The expected format is ami-xxxxxxxx or ami-xxxxxxxxxxxxxxxxx.
I don't know what the solution is, because when I passed in the ami via a variable it said the value had to be "CUSTOM", so I made it that and now it's saying it has to be the typical AMI id format.
You cannot have both the ami_type = "AL2_x86_64" and launch_configuration. The message is a bit misleading, but if you look in [1], you will see where CUSTOM has to be used:
If the node group was deployed using a launch template with a custom AMI, then this is CUSTOM.
So, you have to change the following line:
ami_type = "CUSTOM"
Also, the Terraform docs [2] have something to say about fetching the version of the launch template. The final outlook of your launch_configuration block should be:
launch_template {
id = aws_launch_template.node.id
version = aws_launch_template.node.latest_version
}
[1] https://docs.aws.amazon.com/eks/latest/APIReference/API_Nodegroup.html#AmazonEKS-Type-Nodegroup-amiType
[2] https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group#version

Terraform launch template

I'm creating aws eks cluster via terraform, but when i create cluster nodegroup with launch template i got 2 launch templates - one is with name and settings that i specify and second is with random name but with setting that i specify. Only difference in this 2 launch templat is IAM instance profile that got 2nd group (that creates automatically).
If i trying to specify IAM instance profile in my group it gives error that i cannot use it here
Is i'm doing something wrong or it's normal that it's creates 2 launch template ?
# eks node launch template
resource "aws_launch_template" "this" {
name = "${var.eks_cluster_name}-node-launch-template"
instance_type = var.instance_types[0]
image_id = var.node_ami
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = 80
volume_type = "gp3"
throughput = "125"
encrypted = false
iops = 3000
}
}
lifecycle {
create_before_destroy = true
}
network_interfaces {
security_groups = [data.aws_eks_cluster.this.vpc_config[0].cluster_security_group_id]
}
user_data = base64encode(templatefile("${path.module}/userdata.tpl", merge(local.userdata_vars, local.cluster_data)))
tags = {
"eks:cluster-name" = var.eks_cluster_name
"eks:nodegroup-name" = var.node_group_name
}
tag_specifications {
resource_type = "instance"
tags = {
Name = "${var.eks_cluster_name}-node"
"eks:cluster-name" = var.eks_cluster_name
"eks:nodegroup-name" = var.node_group_name
}
}
tag_specifications {
resource_type = "volume"
tags = {
"eks:cluster-name" = var.eks_cluster_name
"eks:nodegroup-name" = var.node_group_name
}
}
}
# eks nodes
resource "aws_eks_node_group" "this" {
cluster_name = aws_eks_cluster.this.name
node_group_name = var.node_group_name
node_role_arn = aws_iam_role.eksNodesGroup.arn
subnet_ids = data.aws_subnet_ids.private.ids
scaling_config {
desired_size = 1
max_size = 10
min_size = 1
}
update_config {
max_unavailable = 1
}
launch_template {
version = aws_launch_template.this.latest_version
id = aws_launch_template.this.id
}
lifecycle {
create_before_destroy = true
ignore_changes = [
scaling_config[0].desired_size
]
}
# Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
# Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
depends_on = [
aws_iam_role_policy_attachment.AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.AmazonEC2ContainerRegistryReadOnly,
aws_launch_template.this
]
}
Expecting that terraform will create one launch template
Change the attribute name by name_prefix
Use:
name_prefix = "${var.eks_cluster_name}-node-launch-template"
Instead of:
name = "${var.eks_cluster_name}-node-launch-template"
The best choice to create a Unique name for a launch template using prefix (in your case ${var.eks_cluster_name}) is the name_prefix attribute.
Read more here

Terraform error - ECS using spot instances to host containers

Sorry for long post but hope that will provide good background.
Do not know if that is a bug or my code is wrong. I want to create ECS cluster with EC2 spot instances with help of launch template and ASG. My code is as follows:
For ECS service, cluster, task definition:
resource "aws_ecs_cluster" "main" {
name = "test-ecs-cluster"
}
resource "aws_ecs_service" "ec2_service" {
for_each = data.aws_subnet_ids.all_subnets.ids
name = "myservice_${replace(timestamp(), ":", "-")}"
task_definition = aws_ecs_task_definition.task_definition.arn
cluster = aws_ecs_cluster.main.id
desired_count = 1
launch_type = "EC2"
health_check_grace_period_seconds = 10
load_balancer {
container_name = "test-container"
container_port = 80
target_group_arn = aws_lb_target_group.alb_ec2_ecs_tg.id
}
network_configuration {
security_groups = [aws_security_group.ecs_ec2.id]
subnets = [each.value]
assign_public_ip = "false"
}
ordered_placement_strategy {
type = "binpack"
field = "cpu"
}
}
resource "aws_ecs_task_definition" "task_definition" {
container_definitions = data.template_file.task_definition_template.rendered
family = "test-ec2-task-family"
execution_role_arn = aws_iam_role.ecs_task_exec_role_ec2_ecs.arn
task_role_arn = aws_iam_role.ecs_task_exec_role_ec2_ecs.arn
network_mode = "awsvpc"
memory = 1024
cpu = 1024
requires_compatibilities = ["EC2"]
lifecycle {
create_before_destroy = true
}
}
data "template_file" "task_definition_template" {
template = file("${path.module}/templates/task_definition.json.tpl")
vars = {
container_port = var.container_port
region = var.region
log_group = var.cloudwatch_log_group
}
}
Launch template:
resource "aws_launch_template" "template_for_spot" {
name = "test-spor-ecs-launch-template"
disable_api_termination = false
instance_type = "t3.small"
image_id = data.aws_ami.amazon_linux_2_ecs_optimized.id
key_name = "FrankfurtRegion"
user_data = data.template_file.user_data.rendered
vpc_security_group_ids = [aws_security_group.ecs_ec2.id]
monitoring {
enabled = var.enable_spot == "true" ? false : true
}
block_device_mappings {
device_name = "/dev/sda1"
ebs {
volume_size = 30
}
}
iam_instance_profile {
arn = aws_iam_instance_profile.ecs_instance_profile.arn
}
lifecycle {
create_before_destroy = true
}
}
data "template_file" "user_data" {
template = file("${path.module}/user_data.tpl")
vars = {
cluster_name = aws_ecs_cluster.main.name
}
}
ASG with scaling policy:
resource "aws_autoscaling_group" "ecs_spot_asg" {
name = "test-asg-for-ecs"
max_size = 4
min_size = 2
desired_capacity = 2
termination_policies = [
"OldestInstance"]
vpc_zone_identifier = data.aws_subnet_ids.all_subnets.ids
health_check_type = "ELB"
health_check_grace_period = 300
mixed_instances_policy {
instances_distribution {
on_demand_percentage_above_base_capacity = 0
spot_instance_pools = 2
spot_max_price = "0.03"
}
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.template_for_spot.id
version = "$Latest"
}
override {
instance_type = "t3.large"
}
override {
instance_type = "t3.medium"
}
override {
instance_type = "t3a.large"
}
override {
instance_type = "t3a.medium"
}
}
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_autoscaling_policy" "ecs_cluster_scale_policy" {
autoscaling_group_name = aws_autoscaling_group.ecs_spot_asg.name
name = "test-ecs-cluster-scaling-policy"
policy_type = "TargetTrackingScaling"
adjustment_type = "ChangeInCapacity"
target_tracking_configuration {
target_value = 70
customized_metric_specification {
metric_name = "ECS-cluster-metric"
namespace = "AWS/ECS"
statistic = "Average"
metric_dimension {
name = aws_ecs_cluster.main.name
value = aws_ecs_cluster.main.name
}
}
}
}
EDIT:
I'm getting :
Error: InvalidParameterException: Creation of service was not idempotent. "test-ec2-service-qaz"
on ecs.tf line 5, in resource "aws_ecs_service" "ec2_service":
5: resource "aws_ecs_service" "ec2_service" {
EDIT2:
Changed ecs_service name to name = "myservice_${replace(timestamp(), ":", "-")}", still getting same error.
Was reading from other issues that it is becouse of usage lifecycle with create_before_destroy statement in ecs_service, but it is not declared in my code. Maybe it is something related to something else, can't say what.
Thanks #Marko E and #karnauskas on github with name = "myservice_${each.value}" was able to deploy three ECS services. With correction to sub-nets handling I was able to deploy all the "stuff" as required. Subnets:
data "aws_subnet_ids" "all_subnets" {
vpc_id = data.aws_vpc.default.id
}
data "aws_subnet" "subnets" {
for_each = data.aws_subnet_ids.all_subnets.ids
id = each.value
}