How to set AWS EKS nodes to use gp3 - amazon-web-services

I'm trying to set my EKS nodes to use gp3 as volume. It's using the default gp2 but I would like to change it to gp3. I'm using terraform to build the infrastructure and the aws_eks_cluster resource (I'm not using the module "eks"). Here is a simple snippet:
resource "aws_eks_cluster" "cluster" {
name = var.name
role_arn = aws_iam_role.cluster.arn
version = var.k8s_version
}
resource "aws_eks_node_group" "cluster" {
capacity_type = var.node_capacity_type
cluster_name = aws_eks_cluster.cluster.name
disk_size = random_id.node_group.keepers.node_disk
instance_types = split(",", random_id.node_group.keepers.node_type)
node_group_name = "${var.name}-${local.availability_zones[count.index]}-${random_id.node_group.hex}"
node_role_arn = random_id.node_group.keepers.role_arn
subnet_ids = [var.private ? aws_subnet.private[count.index].id : aws_subnet.public[count.index].id]
version = var.k8s_version
}
I tried to set up the kubernetes_storage_class resource but it's only changing for volumes used by the pods (PV/PVC). I would like to change the nodes volume to gp3.
I didn't find in the documentation and in the github how to do that. Was anyone able to do that?
Thanks.

You can try to setup your own launch template and then reference it in aws_eks_node_group - launch_template argument.
Launch template allows you to configure disk type. AWS provides guide on how to write a launch template correctly.

Related

Terraform launch template creating two volumes for AWS EKS Cluster Autoscaling Group

I have a EKS Cluster with a Node Group that is configured with launch template. All of the resources are created with Terraform.
launch_template.tf;
resource "aws_launch_template" "launch-template" {
name = var.name
update_default_version = var.update_default_version
instance_type = var.instance_type
key_name = var.key_name
block_device_mappings {
device_name = var.block_device_name
ebs {
volume_size = var.volume_size
}
}
ebs_optimized = var.ebs_optimized
monitoring {
enabled = var.monitoring_enabled
}
dynamic "tag_specifications" {
for_each = toset(var.resources_to_tag)
content {
resource_type = tag_specifications.key
tags = var.tags
}
}
}
eks_nodegroup.tf;
resource "aws_eks_node_group" "eks-nodegroup" {
cluster_name = var.cluster_name
node_group_name = var.node_group_name
node_role_arn = var.node_role_arn
subnet_ids = var.subnet_ids
labels = var.labels
tags = var.tags
scaling_config {
desired_size = var.desired_size
max_size = var.max_size
min_size = var.min_size
}
launch_template {
id = var.launch_template_id
version = var.launch_template_version
}
}
These resources are binding each other. But at the end of the day,
this setup is creating
2 launch templates,
1 autoscaling group
2 volumes for each instance in autoscaling group.
I understood from this question that, because I'm using aws_launch_template resource with aws_eks_node_group; second launch template is being created. But I didn't understand where the second volume is coming from for each instance. One of the volumes fits my configuration which has 40 GB capacity, path is /dev/sda1 and IOPS is 120. But the second one has 20 GB capacity, path is /dev/xvda and IOPS is 100. I don't have any configuration like this in my Terraform structure.
I didn't find where is the source of the second volume. Any guidance will be highly appreciated, Thank you very much.
Your second volume is being created based on the default volume for the aws_eks_node_group. The disk_size parameter is set by default to 20 GB.
The disk_size parameter is not configurable when using a launch template. It will cause an error if configured.
I suspect you may be using a Bottlerocket AMI which comes with two volumes. One is the OS volume and the second is the data volume. You likely want to configure the data volume size which is exposed at /dev/xvdb by default.
See https://github.com/bottlerocket-os/bottlerocket#default-volumes

Why EKS can't issue certificate to kubelet after nodepool creation?

When I'm creating EKS cluster with single nodepool using terraform, I'm facing the kubelet certificate problem, i.e. csr's are stuck in pending state like this:
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-8qmz5 4m57s kubernetes.io/kubelet-serving kubernetes-admin <none> Pending
csr-mq9rx 5m kubernetes.io/kubelet-serving kubernetes-admin <none> Pending
As we can see REQUESTOR here is kubernetes-admin, and I'm really not sure why.
My terrafrom code for cluster itself:
resource "aws_eks_cluster" "eks" {
name = var.eks_cluster_name
role_arn = var.eks_role_arn
version = var.k8s_version
vpc_config {
endpoint_private_access = "true"
endpoint_public_access = "true"
subnet_ids = var.eks_public_network_ids
security_group_ids = var.eks_security_group_ids
}
kubernetes_network_config {
ip_family = "ipv4"
service_ipv4_cidr = "10.100.0.0/16"
}
}
Terraform code for nodegroup:
resource "aws_eks_node_group" "aks-NG" {
depends_on = [aws_ec2_tag.eks-subnet-cluster-tag, aws_key_pair.eks-deployer]
cluster_name = aws_eks_cluster.eks.name
node_group_name = "aks-dev-NG"
ami_type = "AL2_x86_64"
node_role_arn = var.eks_role_arn
subnet_ids = var.eks_public_network_ids
capacity_type = "ON_DEMAND"
instance_types = var.eks_nodepool_instance_types
disk_size = "50"
scaling_config {
desired_size = 2
max_size = 2
min_size = 2
}
tags = {
Name = "${var.eks_cluster_name}-node"
"kubernetes.io/cluster/${var.eks_cluster_name}" = "owned"
}
remote_access {
ec2_ssh_key = "eks-deployer-key"
}
}
Per my understanding it's very basic configuration.
Now, when I'm creating cluster and nodegroup via AWS management console with exactly SAME parameters, i.e. cluster IAM role and nodegroup IAM roles are same as for Terraform, everything is fine:
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-86qtg 6m20s kubernetes.io/kubelet-serving system:node:ip-172-31-201-140.ec2.internal <none> Approved,Issued
csr-np42b 6m43s kubernetes.io/kubelet-serving system:node:ip-172-31-200-199.ec2.internal <none> Approved,Issued
But here, certificate requestor it's node itself (per my understanding). So I would like to know what's the problem is here? Why requestor is different in this case, what's the difference between creating of these resources from AWS management console and using terraform, and how do I manage this issue? Please help.
UPD.
I found that this problem appears when I'm creating cluster using terraform via assumed role created for terraform.
When i'm creating the cluster using terraform with regular IAM user credentials, with same permissions set everything is fine.
It doesn't gives any answer regarding the root casue, but still, it's something to consider.
Right now it seems like weird EKS bug.

How to create a temporary instance for a custom AMI creation in AWS with terraform?

Im trying to create a Custom AMI for my AWS Deployment with terraform. Its working quite good also its possible to run a bash script. Problem is it's not possible to create the instance temporary and then to terminate the ec2 instance with terraform and all the depending resources.
First im building an "aws_instance" than I provide a bash script in my /tmp folder and let this be done via ssh connection in the terraform script. Looking like the following:
Fist the aws_instance is created based on a standard AWS Amazon Machine Image (AMI). This is used to later create an image from it.
resource "aws_instance" "custom_ami_image" {
tags = { Name = "custom_ami_image" }
ami = var.ami_id //base custom ami id
subnet_id = var.subnet_id
vpc_security_group_ids = [var.security_group_id]
iam_instance_profile = "ec2-instance-profile"
instance_type = "t2.micro"
ebs_block_device {
//...further configurations
}
Now a bash script is provided. The source is the location of the bash script on the local linux box you are executing terraform from. The destination is on the new AWS instance. In the file I install further stuff like python3, oracle drivers and so on...
provisioner "file" {
source = "../bash_file"
destination = "/tmp/bash_file"
}
Then I'll change the permissions on the bash script and execute it with a ssh-user:
provisioner "remote-exec" {
inline = [
"chmod +x /tmp/bash_file",
"sudo /tmp/bash_file",
]
}
No you can login to the ssh-user with the previous created key.
connection {
type = "ssh"
user = "ssh-user"
password = ""
private_key = file("${var.key_name}.pem")
host = self.private_ip
}
}
With the aws_ami_from_instance the ami can be modelled with the current created EC2 instance. And now is callable for further deployments, its also possible to share it in to further aws accounts.
resource "aws_ami_from_instance" "custom_ami_image {
name = "acustom_ami_image"
source_instance_id = aws_instance.custom_ami_image.id
}
Its working fine, but what bothers me is the resulting ec2 instance! Its running and its not possible to terminate it with terraform? Does anyone have an idea how I can handle this? Sure, the running costs are manageable, but I don't like creating datagarbage....
The best way to create AMI images i think is using Packer, also from Hashicorp like Terraform.
What is Packer?
Provision Infrastructure with Packer Packer is HashiCorp's open-source tool for creating machine images from source
configuration. You can configure Packer images with an operating
system and software for your specific use-case.
Packer creates an temporary instance with temporary keypair, security_group and IAM roles. In the provisioner "shell" are custom inline commands possible. Afterwards you can use this ami with your terraform code.
A sample script could look like this:
packer {
required_plugins {
amazon = {
version = ">= 0.0.2"
source = "github.com/hashicorp/amazon"
}
}
}
source "amazon-ebs" "linux" {
# AMI Settings
ami_name = "ami-oracle-python3"
instance_type = "t2.micro"
source_ami = "ami-xxxxxxxx"
ssh_username = "ec2-user"
associate_public_ip_address = false
ami_virtualization_type = "hvm"
subnet_id = "subnet-xxxxxx"
launch_block_device_mappings {
device_name = "/dev/xvda"
volume_size = 8
volume_type = "gp2"
delete_on_termination = true
encrypted = false
}
# Profile Settings
profile = "xxxxxx"
region = "eu-central-1"
}
build {
sources = [
"source.amazon-ebs.linux"
]
provisioner "shell" {
inline = [
"export no_proxy=localhost"
]
}
}
You can find documentation about packer here.

How do I launch a Beanstalk environment with HealthChecks as "EC2 and ELB" and health_check_grace_time as 1500 using terraform?

I have started learning about terraform recently and wanted to create an environment using the above stated settings. When I run the below code I get 2 resources deployed one is beanstalk and other is Auto Scaling group(ASG) the ASG has the desired settings but is not linked with the beanstalk . Hence I am trying to Connect these two.
(I copy the beanstalk Id form the Tags section then head over to ASG under EC2 and search for the same and look at the health Check section)
resource "aws_autoscaling_group" "example" {
launch_configuration = aws_launch_configuration.as_conf.id
min_size = 2
max_size = 10
availability_zones = [ "us-east-1a" ]
health_check_type = "ELB"
health_check_grace_period = 1500
tag {
key = "Name"
value = "terraform-asg-example"
propagate_at_launch = true
}
}
provider "aws" {
region = "us-east-1"
}
resource "aws_elastic_beanstalk_application" "application" {
name = "Test-app"
}
resource "aws_elastic_beanstalk_environment" "environment" {
name = "Test-app"
application = aws_elastic_beanstalk_application.application.name
solution_stack_name = "64bit Windows Server Core 2019 v2.5.6 running IIS 10.0"
setting {
namespace = "aws:autoscaling:launchconfiguration"
name = "IamInstanceProfile"
value = "aws-elasticbeanstalk-ec2-role"
}
setting {
namespace = "aws:autscaling"
}
}
resource "aws_launch_configuration" "as_conf" {
name = "web_config_shivanshu"
image_id = "ami-2757f631"
instance_type = "t2.micro"
lifecycle {
create_before_destroy = true
}
}
You do not create an ASG or launch config/template outside of the Elastic Beanstalk environment and join them together. As there are config options which are not available. For example GP3 SSD is available as part of a launch template, but not available as part of elastic beanstalk yet
What you want to do is remove the resources of
resource "aws_launch_configuration" "as_conf"
resource "aws_autoscaling_group" "example"
Then utilise the setting {} block a lot more within resource "aws_elastic_beanstalk_environment" "environment"
Here is a list of all the settings you can describe in the settings block (https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/command-options-general.html)
So I got it how we can change the Auto-scaling group(ASG) of the beanstalk we have created using terraform. First of all,create the beanstalk according to your setting. we use the Setting block in beanstalk resource and Namespace for configuring it according to our need.
Step-1
Create a beanstalk using terraform
resource "aws_elastic_beanstalk_environment" "test"
{ ...
...
}
Step-2
After you have created the beanstalk.Create autoscaling resource skeleton. the ASG associated with the beanstalk will be handled by terraform under this resource block.Import using the id of ASG that you can get from either terraform plan/show
terraform import aws_autoscaling_group.<Name that you give> asg-id
Step-3
After you have done that Change the beanstalk according to your need
then make sure you have added these to tags.Because sometimes I have noticed that the mapping of this ASG to the beanstalk is is lost.
tag {
key = "elasticbeanstalk:environment-id"
propagate_at_launch = true
value = aws_elastic_beanstalk_environment.<Name of your beanstalk>.id
}
tag {
key = "elasticbeanstalk:environment-name"
propagate_at_launch = true
value = aws_elastic_beanstalk_environment.<Name of your beanstalk>.name
}

How to remove an AWS Instance volume using Terraform

I deploy a CentOS 7 using an AMI that automatically creates a volume on AWS, so when I remove the platform using the next Terraform commands:
terraform plan -destroy -var-file terraform.tfvars -out terraform.tfplan
terraform apply terraform.tfplan
The volume doesn't remove because it was created automatically with the AMI and terraform doesn't create it. Is it possible to remove with terraform?
My AWS instance is created with the next terraform code:
resource "aws_instance" "DCOS-master1" {
ami = "${var.aws_centos_ami}"
availability_zone = "eu-west-1b"
instance_type = "t2.medium"
key_name = "${var.aws_key_name}"
security_groups = ["${aws_security_group.bastion.id}"]
associate_public_ip_address = true
private_ip = "10.0.0.11"
source_dest_check = false
subnet_id = "${aws_subnet.eu-west-1b-public.id}"
tags {
Name = "master1"
}
}
I add the next code to get information about the EBS volume and to take its ID:
data "aws_ebs_volume" "ebs_volume" {
most_recent = true
filter {
name = "attachment.instance-id"
values = ["${aws_instance.DCOS-master1.id}"]
}
}
output "ebs_volume_id" {
value = "${data.aws_ebs_volume.ebs_volume.id}"
}
Then having the EBS volume ID I import to the terraform plan using:
terraform import aws_ebs_volume.data volume-ID
Finally when I run terraform destroy all the instances and volumes are destroyed.
if the EBS is protected you need to manually remove the termination protection first on the console then you can destroy it