Kubernetes service usage for NLB from different AWS account - amazon-web-services

I have a requirement to add kubernetes Service with an ExternalName pointing to NLB(in a different AWS account).
I am using terraform to implement this.
I am not sure how to use NLB info external name section.
Can someone please help?
resource "kubernetes_service" "test_svc" {
metadata {
name = "test"
namespace = var.environment
labels = {
app = "test"
}
}
spec {
type = "ExternalName"
**external_name =**
}
}

Usage of external name is as follows:
apiVersion: v1
kind: Service
metadata:
name: my-service
namespace: prod
spec:
type: ExternalName
externalName: my.database.example.com
Try to put the NLB CNAME as the external name

Related

AWS EKS Fargate logging to AWS Cloudwatch: log groups are not creating

I used official procedure from AWS and this one to enable logging.
Here is yaml files I've applied:
---
kind: Namespace
apiVersion: v1
metadata:
name: aws-observability
labels:
aws-observability: enabled
---
kind: ConfigMap
apiVersion: v1
metadata:
name: aws-logging
namespace: aws-observability
data:
flb_log_cw: "true"
output.conf: |
[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name fluent-bit-cloudwatch
log_stream_prefix from-fluent-bit-
auto_create_group true
log_key log
parsers.conf: |
[PARSER]
Name crio
Format Regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>P|F) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
filters.conf: |
[FILTER]
Name parser
Match *
Key_name log
Parser crio
Inside the pod I can see that logging was enabled:
apiVersion: v1
kind: Pod
metadata:
annotations:
CapacityProvisioned: 2vCPU 4GB
Logging: LoggingEnabled
kubectl.kubernetes.io/restartedAt: "2023-01-17T19:31:20+01:00"
kubernetes.io/psp: eks.privileged
creationTimestamp: "2023-01-17T18:31:28Z"
Logs exists inside the container:
kubectl logs dev-768647846c-hbmv7 -n dev-fargate
But in AWS CloudWatch log groups are not created, even for fluent-bit itself
From the pod cli I can create log groups in AWS Cloudwatch, so the permissions are ok
I also tried cloudwatch instead of cloudwatch_logs plugin, but no luck
I've solved my issue.
The tricky thing is: IAM policy must be attached to the default pod execution role which created automatically with the namespace and it has no relation to the service account \ custom pod execution role

Deploy docker image into GCP GKE using Terraform

I am writing a terraform file in GCP to run a stateless application on a GKE, these are the steps I'm trying to get into terraform.
Create a service account
Grant roles to the service account
Creating the cluster
Configuring the deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: mllp-adapter-deployment
spec:
replicas: 1
selector:
matchLabels:
app: mllp-adapter
template:
metadata:
labels:
app: mllp-adapter
spec:
containers:
- name: mllp-adapter
imagePullPolicy: Always
image: gcr.io/cloud-healthcare-containers/mllp-adapter
ports:
- containerPort: 2575
protocol: TCP
name: "port"
command:
- "/usr/mllp_adapter/mllp_adapter"
- "--port=2575"
- "--hl7_v2_project_id=PROJECT_ID"
- "--hl7_v2_location_id=LOCATION"
- "--hl7_v2_dataset_id=DATASET_ID"
- "--hl7_v2_store_id=HL7V2_STORE_ID"
- "--api_addr_prefix=https://healthcare.googleapis.com:443/v1"
- "--logtostderr"
- "--receiver_ip=0.0.0.0"
Add internal load balancer to make it accesible outside of the cluster
apiVersion: v1
kind: Service
metadata:
name: mllp-adapter-service
annotations:
cloud.google.com/load-balancer-type: "Internal"
spec:
type: LoadBalancer
ports:
- name: port
port: 2575
targetPort: 2575
protocol: TCP
selector:
app: mllp-adapter
I've found this example in order to create an auto-pilot-public cluster, however I don't know where to specify the YAML file of my step 4
Also I've found this other blueprint that deploy a service to the created cluster using the kubernetes provider, which I hope solves my step 5.
I'm new at terraform and GCP architecture in general, I got all of this working following documentation however I'm now trying to find a way to deploy this on a dev enviroment for testing purposes but that's outside of my sandbox and it's supposed to be deployed using terraform, I think I'm getting close to it.
Can someone enlight me what's the next step or how to add those YAML configurations to the .tf examples I've found?
Am I doing this right? :(
You can use this script and extend it further to deploy the YAML files with that : https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/tree/master/examples/simple_autopilot_public
The above TF script is creating the GKE auto pilot cluster for YAML deployment you can use the K8s provider and apply the files using that.
https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/deployment
Full example : https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/tree/master/examples/simple_autopilot_public
main.tf
locals {
cluster_type = "simple-autopilot-public"
network_name = "simple-autopilot-public-network"
subnet_name = "simple-autopilot-public-subnet"
master_auth_subnetwork = "simple-autopilot-public-master-subnet"
pods_range_name = "ip-range-pods-simple-autopilot-public"
svc_range_name = "ip-range-svc-simple-autopilot-public"
subnet_names = [for subnet_self_link in module.gcp-network.subnets_self_links : split("/", subnet_self_link)[length(split("/", subnet_self_link)) - 1]]
}
data "google_client_config" "default" {}
provider "kubernetes" {
host = "https://${module.gke.endpoint}"
token = data.google_client_config.default.access_token
cluster_ca_certificate = base64decode(module.gke.ca_certificate)
}
module "gke" {
source = "../../modules/beta-autopilot-public-cluster/"
project_id = var.project_id
name = "${local.cluster_type}-cluster"
regional = true
region = var.region
network = module.gcp-network.network_name
subnetwork = local.subnet_names[index(module.gcp-network.subnets_names, local.subnet_name)]
ip_range_pods = local.pods_range_name
ip_range_services = local.svc_range_name
release_channel = "REGULAR"
enable_vertical_pod_autoscaling = true
}
Another Good example which use the YAML files as template and apply it using the terraform. : https://github.com/epiphone/gke-terraform-example/tree/master/terraform/dev

AWS - x-real-ip is ip of nginx-ingress-controller

I currently have the following problem. I have a backend that is behind an nginx-ingress controller used as load balancer in aws. Usually i should get the users real ip by either the header x-forwarded-for or x-real-ip. However this ip always points to the ip of the ingress controller in my case. I use terraform to setup my architecture.
This is my current configuration
resource "helm_release" "ingress_nginx" {
name = "ingress-nginx"
namespace = "ingress"
create_namespace = true
chart = "ingress-nginx"
version = "4.0.13"
repository = "https://kubernetes.github.io/ingress-nginx"
values = [
<<-EOF
controller:
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
admissionWebhooks:
enabled: false
EOF
]
}
data "kubernetes_service" "ingress_nginx" {
metadata {
name = "ingress-nginx-controller"
namespace = helm_release.ingress_nginx.metadata[0].namespace
}
}
data "aws_lb" "ingress_nginx" {
name = regex(
"(^[^-]+)",
data.kubernetes_service.ingress_nginx.status[0].load_balancer[0].ingress[0].hostname
)[0]
}
output "lb" {
value = data.aws_lb.ingress_nginx.name
}`
Does anybody know how to fix the header in this config?
To get the real IP address in the header both the ingress controller and the NLB (Network Load Balancer) need to use Proxy Protocol. To do this, in terraform you need to configure the ingress controller with the proxy-protocol option:
resource "helm_release" "ingress_nginx" {
name = "ingress-nginx"
namespace = "ingress"
create_namespace = true
chart = "ingress-nginx"
version = "4.0.13"
repository = "https://kubernetes.github.io/ingress-nginx"
values = [
<<-EOF
controller:
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
config:
use-forwarded-headers: true
use-proxy-protocol: true
enable-real-ip: true
admissionWebhooks:
enabled: false
EOF
]
}
data "kubernetes_service" "ingress_nginx" {
metadata {
name = "ingress-nginx-controller"
namespace = helm_release.ingress_nginx.metadata[0].namespace
}
}
data "aws_lb" "ingress_nginx" {
name = regex(
"(^[^-]+)",
data.kubernetes_service.ingress_nginx.status[0].load_balancer[0].ingress[0].hostname
)[0]
}
output "lb" {
value = data.aws_lb.ingress_nginx.name
}
Now, it seems that currently there is no way to get the NLB to use proxy protocol by annotating the service.
There are some issues in github mentioning that the following annotations would solve it:
*service.beta.kubernetes.io/aws-load-balancer-type: "nlb-ip"*
and
*service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"*
However, these have not worked for me. So, I have enabled proxy protocol manually though the AWS Console by doing the following:
AWS console --> EC2 --> Target groups under load balancing --> Select Target groups associated with your NLB --> Attributes --> Enable proxy protocol 2
I guess that in terraform the optimal solution would be to explicitly create a Network Load Balancer with the correct target group and associate it with the ingress controller service.
Doing this, now I get the correct IP addresses of the clients connecting to the ingress controller.

How to build an ingress controller with central cert and auth management

I want to expose a few webapps in EKS to the internet in a centrally managed secure way.
In AWS, using an ALB is nice, as it for example allows you to terminate TLS and add authentication using Cognito. (see here)
To provision an ALB and connect it to the application there is the aws-load-balancer-controller.
It works fine, but it requires for each and every app/ingress to configure a new ALB:
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/tags: Environment=test,Project=cognito
external-dns.alpha.kubernetes.io/hostname: sample.${COK_MY_DOMAIN}
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
alb.ingress.kubernetes.io/auth-type: cognito
alb.ingress.kubernetes.io/auth-scope: openid
alb.ingress.kubernetes.io/auth-session-timeout: '3600'
alb.ingress.kubernetes.io/auth-session-cookie: AWSELBAuthSessionCookie
alb.ingress.kubernetes.io/auth-on-unauthenticated-request: authenticate
alb.ingress.kubernetes.io/auth-idp-cognito: '{"UserPoolArn": "$(aws cognito-idp describe-user-pool --user-pool-id $COK_COGNITO_USER_POOL_ID --region $COK_AWS_REGION --query 'UserPool.Arn' --output text)","UserPoolClientId":"${COK_COGNITO_USER_POOL_CLIENT_ID}","UserPoolDomain":"${COK_COGNITO_DOMAIN}.auth.${COK_AWS_REGION}.amazoncognito.com"}'
alb.ingress.kubernetes.io/certificate-arn: $COK_ACM_CERT_ARN
alb.ingress.kubernetes.io/target-type: 'ip'
I would love to have one central well defined ALB and all the application do not need to care about this anymore.
My idea was having a regular nginx-ingress-controller and expose it via a central ALB.
Now the question is: How do I connect the ALB to the nginx-controller?
One way would be manually configuring the ALB and build the target group by hand, which does not feel like a stable solution.
Another way would be using aws-load-balancer-controller to connect the nginx. In that case however nginx seems not to be able to publish the correct loadbalancer address and external-dns will enter the wrong DNS records. (Unfortunately there seems to be no --publish-ingress option in usual ingress controllers like nginx or traefik.)
Question:
Is there a way to make the nginx-ingress-controller provide the correct address?
Is there maybe an easier way that combining two ingress controllers?
I think I found a good solution.
I set up my environment using terraform.
After I set up the alb ingress controller, I can create a suitable ingress object, wait until the ALB is up, use terraform to extract the address of the ALB and use publish-status-address to tell nginx to publish exactly that address to all its ingresses:
resource "kubernetes_ingress_v1" "alb" {
wait_for_load_balancer = true
metadata {
name = "alb"
namespace = "kube-system"
annotations = {
"alb.ingress.kubernetes.io/scheme" = "internet-facing"
"alb.ingress.kubernetes.io/listen-ports" = "[{\"HTTP\": 80}, {\"HTTPS\":443}]"
"alb.ingress.kubernetes.io/ssl-redirect" = "443"
"alb.ingress.kubernetes.io/certificate-arn" = local.cert
"alb.ingress.kubernetes.io/target-type" = "ip"
}
}
spec {
ingress_class_name = "alb"
default_backend {
service {
name = "ing-nginx-ingress-nginx-controller"
port {
name = "http"
}
}
}
}
}
resource "helm_release" "ing-nginx" {
name = "ing-nginx"
repository = "https://kubernetes.github.io/ingress-nginx"
chart = "ingress-nginx"
namespace = "kube-system"
set {
name = "controller.service.type"
value = "ClusterIP"
}
set {
name = "controller.publishService.enabled"
value = "false"
}
set {
name = "controller.extraArgs.publish-status-address"
value = kubernetes_ingress_v1.alb.status.0.load_balancer.0.ingress.0.hostname
}
set {
name = "controller.config.use-forwarded-headers"
value = "true"
}
set {
name = "controller.ingressClassResource.default"
value = "true"
}
}
It is a bit weird, as it introduces something like a circular dependency, but the ingress simply waits until nginx is finally up and all is well.
This solution in not exactly the same as the --publish-ingress option as it will not be able to adapt to any changes of the ALB address. - Luckily I don't expect that address to change, so I'm fine with that solution.
You can achieve this with two ingress controllers. The ALB ingress controller will hand the publicly exposed endpoint and route traffic to the nginx ingress controller as its backend. Then you configure your nginx ingress controller for managing ingress of application traffic.

Fixing DataDog agent congestion issues in Amazon EKS cluster

A few months ago I integrated DataDog into my Kubernetes cluster by using a DaemonSet configuration. Since then I've been getting congestion alerts with the following message:
Please tune the hot-shots settings
https://github.com/brightcove/hot-shots#errors
By attempting to follow the docs with my limited Orchestration/DevOps knowledge, what I could gather is that I need to add the following to my DaemonSet config:
spec
.
.
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen
value: "1024"
- name: net.core.wmem_max
value: "4194304"
I attempted to add that configuration piece to one of the auto-deployed DataDog pods directly just to try it out but it hangs indefinitely and doesn't save the configuration (Instead of adding to DaemonSet and risking bringing all agents down).
That hot-shots documentation also mentions that the above sysctl configuration requires unsafe sysctls to be enabled in the nodes that contain the pods:
kubelet --allowed-unsafe-sysctls \
'net.unix.max_dgram_qlen, net.core.wmem_max'
The cluster I am working with is fully deployed with EKS by using the Dashboard in AWS (Little knowledge on how it is configured). The above seems to be indicated for manually deployed and managed cluster.
Why is the configuration I am attempting to apply to a single DataDog agent pod not saving/applying? Is it because it is managed by DaemonSet or is it because it doesn't have the proper unsafe sysctl allowed? Something else?
If I do need to enable the suggested unsafe sysctlon all nodes of my cluster. How do I go about it since the cluster is fully deployed and managed by Amazon EKS?
So we managed to achieve this using a custom launch template with our managed node group and then passing in a custom bootstrap script. This does mean however you need to supply the AMI id yourself and lose the alerts in the console when it is outdated. In Terraform this would look like:
resource "aws_eks_node_group" "group" {
...
launch_template {
id = aws_launch_template.nodes.id
version = aws_launch_template.nodes.latest_version
}
...
}
data "template_file" "bootstrap" {
template = file("${path.module}/files/bootstrap.tpl")
vars = {
cluster_name = aws_eks_cluster.cluster.name
cluster_auth_base64 = aws_eks_cluster.cluster.certificate_authority.0.data
endpoint = aws_eks_cluster.cluster.endpoint
}
}
data "aws_ami" "eks_node" {
owners = ["602401143452"]
most_recent = true
filter {
name = "name"
values = ["amazon-eks-node-1.21-v20211008"]
}
}
resource "aws_launch_template" "nodes" {
...
image_id = data.aws_ami.eks_node.id
user_data = base64encode(data.template_file.bootstrap.rendered)
...
}
Then the bootstrap.hcl file looks like this:
#!/bin/bash
set -o xtrace
systemctl stop kubelet
/etc/eks/bootstrap.sh '${cluster_name}' \
--b64-cluster-ca '${cluster_auth_base64}' \
--apiserver-endpoint '${endpoint}' \
--kubelet-extra-args '"--allowed-unsafe-sysctls=net.unix.max_dgram_qlen"'
The next step is to set up the PodSecurityPolicy, ClusterRole and RoleBinding in your cluster so you can use the securityContext as you described above and then pods in that namespace will be able to run without a SysctlForbidden message.
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: sysctl
spec:
allowPrivilegeEscalation: false
allowedUnsafeSysctls:
- net.unix.max_dgram_qlen
defaultAllowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: allow-sysctl
rules:
- apiGroups:
- policy
resourceNames:
- sysctl
resources:
- podsecuritypolicies
verbs:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: allow-sysctl
namespace: app-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: allow-sysctl
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts:app-namespace
If using the DataDog Helm chart you can set the following values to update the securityContext of the agent. But you will have to update the chart PSP manually to set allowedUnsafeSysctls
datadog:
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen"
value: 512"