Deploying AWS Load Balancer Controller on EKS with Terraform - amazon-web-services

Trying to deploy aws-load-balancer-controller on Kubernetes.
I have the following TF code:
resource "kubernetes_deployment" "ingress" {
metadata {
name = "alb-ingress-controller"
namespace = "kube-system"
labels = {
app.kubernetes.io/name = "alb-ingress-controller"
app.kubernetes.io/version = "v2.2.3"
app.kubernetes.io/managed-by = "terraform"
}
}
spec {
replicas = 1
selector {
match_labels = {
app.kubernetes.io/name = "alb-ingress-controller"
}
}
strategy {
type = "Recreate"
}
template {
metadata {
labels = {
app.kubernetes.io/name = "alb-ingress-controller"
app.kubernetes.io/version = "v2.2.3"
}
}
spec {
dns_policy = "ClusterFirst"
restart_policy = "Always"
service_account_name = kubernetes_service_account.ingress.metadata[0].name
termination_grace_period_seconds = 60
container {
name = "alb-ingress-controller"
image = "docker.io/amazon/aws-alb-ingress-controller:v2.2.3"
image_pull_policy = "Always"
args = [
"--ingress-class=alb",
"--cluster-name=${local.k8s[var.env].esk_cluster_name}",
"--aws-vpc-id=${local.k8s[var.env].cluster_vpc}",
"--aws-region=${local.k8s[var.env].region}"
]
volume_mount {
mount_path = "/var/run/secrets/kubernetes.io/serviceaccount"
name = kubernetes_service_account.ingress.default_secret_name
read_only = true
}
}
volume {
name = kubernetes_service_account.ingress.default_secret_name
secret {
secret_name = kubernetes_service_account.ingress.default_secret_name
}
}
}
}
}
depends_on = [kubernetes_cluster_role_binding.ingress]
}
resource "kubernetes_ingress" "app" {
metadata {
name = "owncloud-lb"
namespace = "fargate-node"
annotations = {
"kubernetes.io/ingress.class" = "alb"
"alb.ingress.kubernetes.io/scheme" = "internet-facing"
"alb.ingress.kubernetes.io/target-type" = "ip"
}
labels = {
"app" = "owncloud"
}
}
spec {
backend {
service_name = "owncloud-service"
service_port = 80
}
rule {
http {
path {
path = "/"
backend {
service_name = "owncloud-service"
service_port = 80
}
}
}
}
}
depends_on = [kubernetes_service.app]
}
This works up to version 1.9 as required. As soon as I upgrade to version 2.2.3 the pod fails to update and on the pod get the following error:{"level":"error","ts":1629207071.4385357,"logger":"setup","msg":"unable to create controller","controller":"TargetGroupBinding","error":"no matches for kind \"TargetGroupBinding\" in version \"elbv2.k8s.aws/v1beta1\""}
I have read the update the doc and have amended the IAM policy as they state but they also mention:
updating the TargetGroupBinding CRDs
And that where I am not sure how to do that using terraform
If I try to do deploy on a new cluster (e.g not an upgrade from 1.9 I get the same error) I get the same error.

With your Terraform code, you apply an Deployment and an Ingress resource, but you must also add the CustomResourceDefinitions for the TargetGroupBinding custom resource.
This is described under "Add Controller to Cluster" in the Load Balancer Controller installation documentation - with examples for Helm and Kubernetes Yaml provided.
Terraform has beta support for applying CRDs including an example of deploying CustomResourceDefinition.

Related

Unable to create an EKS Cluster with an existing security group using Terraform

I'm having issues when trying to create an EKS cluster with a few security groups that I already have created. I don't want a new SG every time I create a new EKS Cluster.
I have a problem with a part of this code under vpc_id part."cluster_create_security_group=false" produces an error, and cluster_security_group_id = "sg-123" is completely ignored.
My code is like this:
provider "aws" {
region = "us-east-2"
}
terraform {
backend "s3" {
bucket = "mys3bucket"
key = "eks/terraform.tfstate"
region = "us-east-2"
}
}
data "aws_eks_cluster" "cluster" {
name = module.eks.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks.cluster_id
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
variable "cluster_security_group_id" {
description = "Existing security group ID to be attached to the cluster. Required if `create_cluster_security_group` = `false`"
type = string
default = "sg-1234"
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 18.0"
cluster_name = "cluster-example"
cluster_version = "1.21" #This may vary depending on the purpose of the cluster
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
cluster_addons = {
coredns = {
resolve_conflicts = "OVERWRITE"
}
kube-proxy = {}
vpc-cni = {
resolve_conflicts = "OVERWRITE"
}
}
vpc_id = "vpc-12345"
subnet_ids = ["subnet-123", "subnet-456", "subnet-789"]
create_cluster_security_group=false ----------> ERROR: An argument named "cluster_create_security_group" is not expected here
cluster_security_group_id = "my-security-group-id"
# EKS Managed Node Group(s)
eks_managed_node_group_defaults = {
disk_size = 50
instance_types = ["t3.medium"]
}
eks_managed_node_groups = {
Test-Nodegroup = {
min_size = 2
max_size = 5
desired_size = 2
instance_types = ["t3.large"]
capacity_type = "SPOT"
}
}
tags = {
Environment = "dev"
Terraform = "true"
}
}
Where am I wrong? This is my whole Terraform file.

Using a public ECR image in local Kubernetes cluster in Terraform

I've setup a very simple local kubernetes cluster for development purposes, and for that I aim to pull a docker image for my pods from ECR.
Here's the code
terraform {
required_providers {
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.0.0"
}
}
}
provider "kubernetes" {
config_path = "~/.kube/config"
}
resource "kubernetes_deployment" "test" {
metadata {
name = "test-deployment"
namespace = kubernetes_namespace.test.metadata.0.name
}
spec {
replicas = 2
selector {
match_labels = {
app = "MyTestApp"
}
}
template {
metadata {
labels = {
app = "MyTestApp"
}
}
spec {
container {
image = "public ECR URL" <--- this times out
name = "myTestPod"
port {
container_port = 4000
}
}
}
}
}
}
I've set that ECR repo to public and made sure that it's accessible. My challenge is that in a normal scenario you have to login to ECR in order to retrieve the image, and I do not know how to achieve that in Terraform. So on 'terraform apply', it times out and fails.
I read the documentation on aws_ecr_repository, aws_ecr_authorization_token,Terraform EKS module and local-exec, but none of them seem to have a solution for this.
Achieving this in a Gitlab pipeline is fairly easy, but how can one achieve this in Terraform? how can I pull an image from a public ECR repo for my local Kubernetes cluster?
After a while, I figured out the cleanest way to achieve this;
First retrieve your ECR authorization token data;
data "aws_ecr_authorization_token" "token" {
}
Second, create a secret for your kubernetes cluster**:
resource "kubernetes_secret" "docker" {
metadata {
name = "docker-cfg"
namespace = kubernetes_namespace.test.metadata.0.name
}
data = {
".dockerconfigjson" = jsonencode({
auths = {
"${data.aws_ecr_authorization_token.token.proxy_endpoint}" = {
auth = "${data.aws_ecr_authorization_token.token.authorization_token}"
}
}
})
}
type = "kubernetes.io/dockerconfigjson"
}
Bear in mind that the example in the docs base64 encodes the username and password. The exported attribute authorization_token does the same thing.
Third, once the secret is created, you can then have your pods use it as the image_pull_secrets:
resource "kubernetes_deployment" "test" {
metadata {
name = "MyTestApp"
namespace = kubernetes_namespace.test.metadata.0.name
}
spec {
replicas = 2
selector {
match_labels = {
app = "MyTestApp"
}
}
template {
metadata {
labels = {
app = "MyTestApp"
}
}
spec {
image_pull_secrets {
name = "docker-cfg"
}
container {
image = "test-image-URL"
name = "test-image-name"
image_pull_policy = "Always"
port {
container_port = 4000
}
}
}
}
}
depends_on = [
kubernetes_secret.docker,
]
}
Gotcha: the token expires after 12 hours, so you should either write a bash script that updates your secret in the corresponding namespace, or you should write a Terraform provisioner that gets triggered every time the token expires.
I hope this was helpful.

Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80

I'm trying to deploy a cluster with self managed node groups. No matter what config options I use, I always come up with the following error:
Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80: connect: connection refusedwith module.eks-ssp.kubernetes_config_map.aws_auth[0]on .terraform/modules/eks-ssp/aws-auth-configmap.tf line 19, in resource "kubernetes_config_map" "aws_auth":resource "kubernetes_config_map" "aws_auth" {
​
The .tf file looks like this:
module "eks-ssp" {
source = "github.com/aws-samples/aws-eks-accelerator-for-terraform"
# EKS CLUSTER
tenant = "DevOpsLabs2"
environment = "dev-test"
zone = ""
terraform_version = "Terraform v1.1.4"
# EKS Cluster VPC and Subnet mandatory config
vpc_id = "xxx"
private_subnet_ids = ["xxx","xxx", "xxx", "xxx"]
# EKS CONTROL PLANE VARIABLES
create_eks = true
kubernetes_version = "1.19"
# EKS SELF MANAGED NODE GROUPS
self_managed_node_groups = {
self_mg = {
node_group_name = "DevOpsLabs2"
subnet_ids = ["xxx","xxx", "xxx", "xxx"]
create_launch_template = true
launch_template_os = "bottlerocket" # amazonlinux2eks or bottlerocket or windows
custom_ami_id = "xxx"
public_ip = true # Enable only for public subnets
pre_userdata = <<-EOT
yum install -y amazon-ssm-agent \
systemctl enable amazon-ssm-agent && systemctl start amazon-ssm-agent \
EOT
disk_size = 20
instance_type = "t2.small"
desired_size = 2
max_size = 10
min_size = 2
capacity_type = "" # Optional Use this only for SPOT capacity as capacity_type = "spot"
k8s_labels = {
Environment = "dev-test"
Zone = ""
WorkerType = "SELF_MANAGED_ON_DEMAND"
}
additional_tags = {
ExtraTag = "t2x-on-demand"
Name = "t2x-on-demand"
subnet_type = "public"
}
create_worker_security_group = false # Creates a dedicated sec group for this Node Group
},
}
}
module "eks-ssp-kubernetes-addons" {
source = "github.com/aws-samples/aws-eks-accelerator-for-terraform//modules/kubernetes-addons"
eks_cluster_id = module.eks-ssp.eks_cluster_id
# EKS Addons
enable_amazon_eks_vpc_cni = true
enable_amazon_eks_coredns = true
enable_amazon_eks_kube_proxy = true
enable_amazon_eks_aws_ebs_csi_driver = true
#K8s Add-ons
enable_aws_load_balancer_controller = true
enable_metrics_server = true
enable_cluster_autoscaler = true
enable_aws_for_fluentbit = true
enable_argocd = true
enable_ingress_nginx = true
depends_on = [module.eks-ssp.self_managed_node_groups]
}
Providers:
terraform {
backend "remote" {}
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.66.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.6.1"
}
helm = {
source = "hashicorp/helm"
version = ">= 2.4.1"
}
}
}
Based on the example provided in the Github repo [1], my guess is that the provider configuration blocks are missing for this to work as expected. Looking at the code provided in the question, it seems that the following needs to be added:
data "aws_region" "current" {}
data "aws_eks_cluster" "cluster" {
name = module.eks-ssp.eks_cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks-ssp.eks_cluster_id
}
provider "aws" {
region = data.aws_region.current.id
alias = "default" # this should match the named profile you used if at all
}
provider "kubernetes" {
experiments {
manifest_resource = true
}
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
If helm is also required, I think the following block [2] needs to be added as well:
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
token = data.aws_eks_cluster_auth.cluster.token
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
}
}
Provider argument reference for kubernetes and helm is in [3] and [4] respectively.
[1] https://github.com/aws-samples/aws-eks-accelerator-for-terraform/blob/main/examples/eks-cluster-with-self-managed-node-groups/main.tf#L23-L47
[2] https://github.com/aws-samples/aws-eks-accelerator-for-terraform/blob/main/examples/eks-cluster-with-eks-addons/main.tf#L49-L55
[3] https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs#argument-reference
[4] https://registry.terraform.io/providers/hashicorp/helm/latest/docs#argument-reference
The above answer from Marko E seems to fix / just ran into this issue. After applying the above code, altogether in a separate providers.tf file, terraform now makes it past the error. Will post later as to whether the deployment makes it fully through.
For reference was able to go from 65 resources created down to 42 resources created before I hit this error. This was using the exact best practice / sample configuration recommended at the top of the README from AWS Consulting here: https://github.com/aws-samples/aws-eks-accelerator-for-terraform
In my case i was trying to deploy to the kubernetes cluster(GKE) using Terraform. I have replaced the kubeconfig path with the kubeconfig file's absolute path.
From
provider "kubernetes" {
config_path = "~/.kube/config"
#config_context = "my-context"
}
TO
provider "kubernetes" {
config_path = "/Users/<username>/.kube/config"
#config_context = "my-context"
}

Terraform AWS Kubernetes EKS resources with ALB Ingress Controller won't create load balancer

I have been trying to create an EKS cluster with self managed nodes on AWS using Terraform but I can't get my Kubernetes Ingress to create a load balancer. There are no errors but no load balancer gets created, it just times out.
I did create a load balancer manually in my account first and verified that the load balancer role is present. The policy AWSElasticLoadBalancingServiceRolePolicy is accessed when my Terraform code runs.
I have relied heavily on this tutorial
tfvars:
aws_region = "ap-southeast-1"
domain = "*.mydomain.com"
cluster_name = "my-tf-eks-cluster"
vpc_id = "vpc-0d7700e26db6b3e21"
app_subnet_ids = "subnet-03c1e8c57110c92e0, subnet-0413e8bf24cb32595, subnet-047dcce0b810f0fbd"
// gateway subnet IDs
Terraform code:
terraform {
}
provider "aws" {
region = var.aws_region
version = "~> 2.8"
}
data "aws_acm_certificate" "default" {
domain = var.domain
statuses = ["ISSUED"]
}
resource "kubernetes_service_account" "alb-ingress" {
metadata {
name = "alb-ingress-controller"
namespace = "kube-system"
labels = {
"app.kubernetes.io/name" = "alb-ingress-controller"
}
}
automount_service_account_token = true
}
resource "kubernetes_cluster_role" "alb-ingress" {
metadata {
name = "alb-ingress-controller"
labels = {
"app.kubernetes.io/name" = "alb-ingress-controller"
}
}
rule {
api_groups = ["", "extensions"]
resources = ["configmaps", "endpoints", "events", "ingresses", "ingresses/status", "services"]
verbs = ["create", "get", "list", "update", "watch", "patch"]
}
rule {
api_groups = ["", "extensions"]
resources = ["nodes", "pods", "secrets", "services", "namespaces"]
verbs = ["get", "list", "watch"]
}
}
resource "kubernetes_cluster_role_binding" "alb-ingress" {
metadata {
name = "alb-ingress-controller"
labels = {
"app.kubernetes.io/name" = "alb-ingress-controller"
}
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "alb-ingress-controller"
}
subject {
kind = "ServiceAccount"
name = "alb-ingress-controller"
namespace = "kube-system"
}
}
resource "kubernetes_deployment" "alb-ingress" {
metadata {
name = "alb-ingress-controller"
labels = {
"app.kubernetes.io/name" = "alb-ingress-controller"
}
namespace = "kube-system"
}
spec {
selector {
match_labels = {
"app.kubernetes.io/name" = "alb-ingress-controller"
}
}
template {
metadata {
labels = {
"app.kubernetes.io/name" = "alb-ingress-controller"
}
}
spec {
volume {
name = kubernetes_service_account.alb-ingress.default_secret_name
secret {
secret_name = kubernetes_service_account.alb-ingress.default_secret_name
}
}
container {
# This is where you change the version when Amazon comes out with a new version of the ingress controller
image = "docker.io/amazon/aws-alb-ingress-controller:v1.1.8"
name = "alb-ingress-controller"
args = [
"--ingress-class=alb",
"--cluster-name=${var.cluster_name}",
"--aws-vpc-id=${var.vpc_id}",
"--aws-region=${var.aws_region}"
]
volume_mount {
name = kubernetes_service_account.alb-ingress.default_secret_name
mount_path = "/var/run/secrets/kubernetes.io/serviceaccount"
read_only = true
}
}
service_account_name = "alb-ingress-controller"
}
}
}
}
resource "kubernetes_ingress" "main" {
metadata {
name = "main-ingress"
annotations = {
"alb.ingress.kubernetes.io/scheme" = "internet-facing"
"kubernetes.io/ingress.class" = "alb"
"alb.ingress.kubernetes.io/subnets" = "${var.app_subnet_ids}"
"alb.ingress.kubernetes.io/certificate-arn" = "${data.aws_acm_certificate.default.arn}"
"alb.ingress.kubernetes.io/listen-ports" = <<JSON
[
{"HTTP": 80},
{"HTTPS": 443}
]
JSON
"alb.ingress.kubernetes.io/actions.ssl-redirect" = <<JSON
{
"Type": "redirect",
"RedirectConfig": {
"Protocol": "HTTPS",
"Port": "443",
"StatusCode": "HTTP_301"
}
}
JSON
}
}
spec {
rule {
host = "app.xactpos.com"
http {
path {
backend {
service_name = "ssl-redirect"
service_port = "use-annotation"
}
path = "/*"
}
path {
backend {
service_name = "app-service1"
service_port = 80
}
path = "/service1"
}
path {
backend {
service_name = "app-service2"
service_port = 80
}
path = "/service2"
}
}
}
rule {
host = "api.xactpos.com"
http {
path {
backend {
service_name = "ssl-redirect"
service_port = "use-annotation"
}
path = "/*"
}
path {
backend {
service_name = "api-service1"
service_port = 80
}
path = "/service3"
}
path {
backend {
service_name = "api-service2"
service_port = 80
}
path = "/service4"
}
}
}
}
wait_for_load_balancer = true
}
I am by no means a K8s expert, but I went through the Terraform code, and the only thing that I see as an option that could possibly help you debug this seems to be the wait_for_load_balancer option in the kubernetes_ingress resource. From the documentation:
Terraform will wait for the load balancer to have at least 1 endpoint before considering the resource created.
Maybe the output will be more clear in that case (if the creation fails for some reason), or you might find out why it's not creating an LB.
I had the kubernetes ingress pointing to the application subnet instead of the gateway subnet. I think that was the problem.

Terraform 0.12 Creating ingress rules with template

I wonder if something like this could be used. I have functional code working, but as pods in kubernetes will grow up fast, I want to convert into templates. This is an example for the creation of each nginx ingress rule for each wordpress site pod.
Now, each pod has its wordpress ingress entry with:
resource "kubernetes_ingress" "ingress_nginx_siteA" {
metadata {
name = "ingress-nginx-siteA"
namespace = "default"
annotations = { "kubernetes.io/ingress.class" = "nginx", "nginx.ingress.kubernetes.io/configuration-snippet" = "modsecurity_rules '\n SecRuleEngine On\n SecRequestBodyAccess On\n SecAuditEngine RelevantOnly\n SecAuditLogParts ABCIJDEFHZ\n SecAuditLog /var/log/modsec_audit.log\n SecRuleRemoveById 932140\n';\n", "nginx.ingress.kubernetes.io/ssl-passthrough" = "true" }
}
spec {
tls {
hosts = ["siteA.test.com"]
secret_name = "wildcard-test-com"
}
rule {
host = "siteA.test.com"
http {
path {
path = "/"
backend {
service_name = "siteA"
service_port = "80"
}
}
}
}
}
}
Now I want to split into variables.tf that contain the whole sites variables, a template file rules.tpl and the main.tf that orchestrate this stuff.
variables.tf:
variable "wordpress_site" {
type = map(object({
name = string
url = string
certificate = string
}))
default = {
siteA = {
name = siteA
url = siteA.test.com
certificate = wildcard-test-com
}
siteB = {
name = siteB
url = siteB.test.com
certificate = wildcard-test-com
}
}
}
rules.tpl:
%{ for name in wordpress_site.name ~}
resource "kubernetes_ingress" "ingress_nginx_${name}" {
metadata {
name = "ingress-nginx-${name}"
namespace = "default"
annotations = { "kubernetes.io/ingress.class" = "nginx", "nginx.ingress.kubernetes.io/configuration-snippet" = "modsecurity_rules '\n SecRuleEngine On\n SecRequestBodyAccess On\n SecAuditEngine RelevantOnly\n SecAuditLogParts ABCIJDEFHZ\n SecAuditLog /var/log/modsec_audit.log\n SecRuleRemoveById 932140\n';\n", "nginx.ingress.kubernetes.io/ssl-passthrough" = "true" }
}
spec {
tls {
hosts = ["${wordpress_site.url}"]
secret_name = "${wordpress_site.certificate}"
}
rule {
host = "${wordpress_site.url}"
http {
path {
path = "/"
backend {
service_name = "${name}"
service_port = "80"
}
}
}
}
}
}
%{ endfor ~}
and now, in main.tf, what is the best way in order to mix it all? I see that new functionalities are added in TF 0.12 like templatefile function, but didn't know at all if I can use it like:
main.tf:
templatefile(${path.module}/rules.tpl, ${module.var.wordpress_site})
Thanks all for your support!
The templatefile function is for generating strings from a template, not for generating Terraform configuration. Although it would be possible to render your given template to produce a string containing Terraform configuration, Terraform would just see the result as a normal string, not as more configuration to be evaluated.
Instead, what we need to get the desired result is resource for_each, which allows creating multiple instances from a single resource based on a map value.
resource "kubernetes_ingress" "nginx" {
for_each = var.wordpress_site
metadata {
name = "ingress-nginx-${each.value.name}"
namespace = "default"
annotations = {
"kubernetes.io/ingress.class" = "nginx"
"nginx.ingress.kubernetes.io/configuration-snippet" = <<-EOT
modsecurity_rules '
SecRuleEngine On
SecRequestBodyAccess On
SecAuditEngine RelevantOnly
SecAuditLogParts ABCIJDEFHZ
SecAuditLog /var/log/modsec_audit.log
SecRuleRemoveById 932140
';
EOT
"nginx.ingress.kubernetes.io/ssl-passthrough" = "true"
}
}
spec {
tls {
hosts = [each.value.url]
secret_name = each.value.certificate
}
rule {
host = each.value.url
http {
path {
path = "/"
backend {
service_name = each.value.name
service_port = "80"
}
}
}
}
}
}
When a resource has for_each set, Terraform will evaluate the given argument to obtain a map, and will then create one instance of the resource for each element in that map, with each one identified by its corresponding map key. In this case, assuming the default value of var.wordpress_site, you'll get two instances with the following addresses:
kubernetes_ingress.nginx["siteA"]
kubernetes_ingress.nginx["siteB"]
Inside the resource block, references starting with each.value refer to the values from the map, which in this case are the objects describing each site.