Deploy docker image into GCP GKE using Terraform

Deploy docker image into GCP GKE using Terraform - google-cloud-platform

I am writing a terraform file in GCP to run a stateless application on a GKE, these are the steps I'm trying to get into terraform.
Create a service account
Grant roles to the service account
Creating the cluster
Configuring the deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: mllp-adapter-deployment
spec:
replicas: 1
selector:
matchLabels:
app: mllp-adapter
template:
metadata:
labels:
app: mllp-adapter
spec:
containers:
- name: mllp-adapter
imagePullPolicy: Always
image: gcr.io/cloud-healthcare-containers/mllp-adapter
ports:
- containerPort: 2575
protocol: TCP
name: "port"
command:
- "/usr/mllp_adapter/mllp_adapter"
- "--port=2575"
- "--hl7_v2_project_id=PROJECT_ID"
- "--hl7_v2_location_id=LOCATION"
- "--hl7_v2_dataset_id=DATASET_ID"
- "--hl7_v2_store_id=HL7V2_STORE_ID"
- "--api_addr_prefix=https://healthcare.googleapis.com:443/v1"
- "--logtostderr"
- "--receiver_ip=0.0.0.0"
Add internal load balancer to make it accesible outside of the cluster
apiVersion: v1
kind: Service
metadata:
name: mllp-adapter-service
annotations:
cloud.google.com/load-balancer-type: "Internal"
spec:
type: LoadBalancer
ports:
- name: port
port: 2575
targetPort: 2575
protocol: TCP
selector:
app: mllp-adapter
I've found this example in order to create an auto-pilot-public cluster, however I don't know where to specify the YAML file of my step 4
Also I've found this other blueprint that deploy a service to the created cluster using the kubernetes provider, which I hope solves my step 5.
I'm new at terraform and GCP architecture in general, I got all of this working following documentation however I'm now trying to find a way to deploy this on a dev enviroment for testing purposes but that's outside of my sandbox and it's supposed to be deployed using terraform, I think I'm getting close to it.
Can someone enlight me what's the next step or how to add those YAML configurations to the .tf examples I've found?
Am I doing this right? :(

You can use this script and extend it further to deploy the YAML files with that : https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/tree/master/examples/simple_autopilot_public
The above TF script is creating the GKE auto pilot cluster for YAML deployment you can use the K8s provider and apply the files using that.
https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/deployment
Full example : https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/tree/master/examples/simple_autopilot_public
main.tf
locals {
cluster_type = "simple-autopilot-public"
network_name = "simple-autopilot-public-network"
subnet_name = "simple-autopilot-public-subnet"
master_auth_subnetwork = "simple-autopilot-public-master-subnet"
pods_range_name = "ip-range-pods-simple-autopilot-public"
svc_range_name = "ip-range-svc-simple-autopilot-public"
subnet_names = [for subnet_self_link in module.gcp-network.subnets_self_links : split("/", subnet_self_link)[length(split("/", subnet_self_link)) - 1]]
}
data "google_client_config" "default" {}
provider "kubernetes" {
host = "https://${module.gke.endpoint}"
token = data.google_client_config.default.access_token
cluster_ca_certificate = base64decode(module.gke.ca_certificate)
}
module "gke" {
source = "../../modules/beta-autopilot-public-cluster/"
project_id = var.project_id
name = "${local.cluster_type}-cluster"
regional = true
region = var.region
network = module.gcp-network.network_name
subnetwork = local.subnet_names[index(module.gcp-network.subnets_names, local.subnet_name)]
ip_range_pods = local.pods_range_name
ip_range_services = local.svc_range_name
release_channel = "REGULAR"
enable_vertical_pod_autoscaling = true
}
Another Good example which use the YAML files as template and apply it using the terraform. : https://github.com/epiphone/gke-terraform-example/tree/master/terraform/dev

Related

AWS EKS secrets store provider - Failed to fetch secret from all regions: name

I am using the AWS secrets store CSI provider to sync secrets from the AWS Secret Manager into Kubernetes/EKS.
The SecretProviderClass is:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: test-provider
spec:
provider: aws
parameters:
objects: |
- objectName: mysecret
objectType: secretsmanager
jmesPath:
- path: APP_ENV
objectAlias: APP_ENV
- path: APP_DEBUG
objectAlias: APP_DEBUG
And the Pod mounting these secrets is:
apiVersion: v1
kind: Pod
metadata:
name: secret-pod
spec:
restartPolicy: Never
serviceAccountName: my-account
terminationGracePeriodSeconds: 2
containers:
- name: dotfile-test-container
image: registry.k8s.io/busybox
volumeMounts:
- name: secret-volume
readOnly: true
mountPath: "/mnt/secret-volume"
volumes:
- name: secret-volume
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: test-provider
The secret exists in the Secret Provider:
{
"APP_ENV": "staging",
"APP_DEBUG": false
}
(this is an example, I am aware I do not need to store these particular variables as secrets)
But when I create the resources, the Pod fails to run with
Warning
FailedMount
96s (x10 over 5m47s)
kubelet
MountVolume.SetUp failed for volume "secret-volume" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod pace/secret-dotfiles-pod,
err: rpc error: code = Unknown desc = Failed to fetch secret from all regions: mysecret

Turns out the error message is very misleading. The problem in my case was due to the type of the APP_DEBUG value. Changing it from a boolean to string
fixed the problem and now the pod starts correctly.
{
"APP_ENV": "staging",
"APP_DEBUG": "false"
}
Seems like a bug in the provider to me.

Redis deployed in AWS - Connection time out from localhost SpringBoot app

Small question regarding Redis deployed in AWS (not AWS Elastic Cache) and an issue connecting to it.
Here is the setup of the Redis deployed in AWS: (pasting only the Kubernetes StatefulSet and Service)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
initContainers:
- name: config
image: redis:7.0.5-alpine
command: [ "sh", "-c" ]
args:
- |
cp /tmp/redis/redis.conf /etc/redis/redis.conf
echo "finding master..."
MASTER_FDQN=`hostname -f | sed -e 's/redis-[0-9]\./redis-0./'`
if [ "$(redis-cli -h sentinel -p 5000 ping)" != "PONG" ]; then
echo "master not found, defaulting to redis-0"
if [ "$(hostname)" = "redis-0" ]; then
echo "this is redis-0, not updating config..."
else
echo "updating redis.conf..."
echo "slaveof $MASTER_FDQN 6379" >> /etc/redis/redis.conf
fi
else
echo "sentinel found, finding master"
MASTER="$(redis-cli -h sentinel -p 5000 sentinel get-master-addr-by-name mymaster | grep -E '(^redis-\d{1,})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})')"
echo "master found : $MASTER, updating redis.conf"
echo "slaveof $MASTER 6379" >> /etc/redis/redis.conf
fi
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: config
mountPath: /tmp/redis/
containers:
- name: redis
image: redis:7.0.5-alpine
command: ["redis-server"]
args: ["/etc/redis/redis.conf"]
ports:
- containerPort: 6379
name: redis
volumeMounts:
- name: data
mountPath: /data
- name: redis-config
mountPath: /etc/redis/
volumes:
- name: redis-config
emptyDir: {}
- name: config
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: nfs-1
resources:
requests:
storage: 50Mi
---
apiVersion: v1
kind: Service
metadata:
name: redis
spec:
ports:
- port: 6379
targetPort: 6379
name: redis
selector:
app: redis
type: LoadBalancer
The pods are healthy, I can exec into it and perform operations fine. Here is the get all:
NAME READY STATUS RESTARTS AGE
pod/redis-0 1/1 Running 0 22h
pod/redis-1 1/1 Running 0 22h
pod/redis-2 1/1 Running 0 22h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/redis LoadBalancer 192.168.45.55 10.51.5.2 6379:30315/TCP 26h
NAME READY AGE
statefulset.apps/redis 3/3 22h
Here is the describe of the service:
Name: redis
Namespace: Namespace
Labels: <none>
Annotations: <none>
Selector: app=redis
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 192.168.22.33
IPs: 192.168.22.33
LoadBalancer Ingress: 10.51.5.2
Port: redis 6379/TCP
TargetPort: 6379/TCP
NodePort: redis 30315/TCP
Endpoints: 192.xxx:6379,192.xxx:6379,192.xxx:6379
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IPAllocated 68s metallb-controller Assigned IP ["10.51.5.2"]
Normal nodeAssigned 58s (x5 over 66s) metallb-speaker announcing from node "someaddress.com" with protocol "bgp"
Normal nodeAssigned 58s (x5 over 66s) metallb-speaker announcing from node "someaddress.com" with protocol "bgp"
I then try to connect to it, i.e. inserting some data with a very straightforward Spring Boot application. The application has no business logic, just trying to insert data.
Here are the relevant parts:
#Configuration
public class RedisConfiguration {
#Bean
public ReactiveRedisConnectionFactory reactiveRedisConnectionFactory() {
return new LettuceConnectionFactory("10.51.5.2", 30315);
}
#Repository
public class RedisRepository {
private final ReactiveRedisOperations<String, String> reactiveRedisOperations;
public RedisRepository(ReactiveRedisOperations<String, String> reactiveRedisOperations) {
this.reactiveRedisOperations = reactiveRedisOperations;
}
public Mono<RedisPojo> save(RedisPojo redisPojo) {
return reactiveRedisOperations.opsForValue().set(redisPojo.getInput(), redisPojo.getOutput()).map(__ -> redisPojo);
}
Each time I am trying to write the data, I am getting this exception:
2022-12-02T20:20:08.015+08:00 ERROR 1184 --- [ctor-http-nio-3] a.w.r.e.AbstractErrorWebExceptionHandler : [8f16a752-1] 500 Server Error for HTTP POST "/save"
org.springframework.data.redis.RedisConnectionFailureException: Unable to connect to Redis
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.translateException(LettuceConnectionFactory.java:1602) ~[spring-data-redis-3.0.0.jar:3.0.0]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ Handler com.redis.controller.RedisController#test(RedisRequest) [DispatcherHandler]
*__checkpoint ⇢ HTTP POST "/save" [ExceptionHandlingWebHandler]
Original Stack Trace:
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.translateException(LettuceConnectionFactory.java:1602) ~[spring-data-redis-3.0.0.jar:3.0.0]
Caused by: io.lettuce.core.RedisConnectionException: Unable to connect to 10.51.5.2/<unresolved>:30315
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:78) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:56) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.AbstractRedisClient.getConnection(AbstractRedisClient.java:350) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.RedisClient.connect(RedisClient.java:216) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /10.51.5.2:30315
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.85.Final.jar:4.1.85.Final]
This is particularly puzzling, because I am quite sure the code of the Spring Boot app is working. When I change the IP of return new LettuceConnectionFactory("10.51.5.2", 30315);: to
a regular Redis on my laptop ("localhost", 6379),
a dockerized Redis on my laptop,
a dockerized Redis on prem, all are working fine.
Therefore, I am quite puzzled what did I do wrong with the setup of this Redis in AWS.
What should I do in order to connect to it properly.
May I get some help please?
Thank you

By default, Redis binds itself to the IP addresses 127.0.0.1 and ::1 and does not accept connections against non-local interfaces. Chances are high that this is your main issue and you may want to review your redis.conf file to bind Redis to the interface you need or to the generic * -::*, as explained in the comments of the config file itself (which I have linked above).
With that being said, Redis also does not accept connections on non-local interfaces if the default user has no password - a security layer named Protected mode. Thus you should either give your default user a password or disable protected mode in your redis.conf file.
Not sure if this applies to your case but, as a side note, I would suggest to always avoid exposing Redis to the Internet.

You are mixing 2 things.
To enable this service for pods in different namespaces you do not need external load balancer, you can just try to use redis.namespace-name:6379 dns name and it will just work. Such dns is there for every service you create (but works only inside kubernetes)
Kubernetes will make sure that your traffic will be routed to proper pods (assuming there is more than one).
If you want to expose redis from outside of kubernetes then you need to make sure there is connectivity from the outside and then you need network load balancer that will forward traffic to your kubernetes service (in your case node port, so you need NLB with eks worker nodes: 30315 as a targets)
If your worker nodes have public IP and their SecurityGroups allow connecting to them directly, you could try to connect to worker node's IP directly just to test things out (without LB).
And regardless off yout setup you can always create proxy via kubectl
kubectl port-forward -n redisNS svc/redis 6379:6379
and connect from spring boot app to localhost:6379
How do you want to connect from app to redis in a final setup?

Fixing DataDog agent congestion issues in Amazon EKS cluster

A few months ago I integrated DataDog into my Kubernetes cluster by using a DaemonSet configuration. Since then I've been getting congestion alerts with the following message:
Please tune the hot-shots settings
https://github.com/brightcove/hot-shots#errors
By attempting to follow the docs with my limited Orchestration/DevOps knowledge, what I could gather is that I need to add the following to my DaemonSet config:
spec
.
.
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen
value: "1024"
- name: net.core.wmem_max
value: "4194304"
I attempted to add that configuration piece to one of the auto-deployed DataDog pods directly just to try it out but it hangs indefinitely and doesn't save the configuration (Instead of adding to DaemonSet and risking bringing all agents down).
That hot-shots documentation also mentions that the above sysctl configuration requires unsafe sysctls to be enabled in the nodes that contain the pods:
kubelet --allowed-unsafe-sysctls \
'net.unix.max_dgram_qlen, net.core.wmem_max'
The cluster I am working with is fully deployed with EKS by using the Dashboard in AWS (Little knowledge on how it is configured). The above seems to be indicated for manually deployed and managed cluster.
Why is the configuration I am attempting to apply to a single DataDog agent pod not saving/applying? Is it because it is managed by DaemonSet or is it because it doesn't have the proper unsafe sysctl allowed? Something else?
If I do need to enable the suggested unsafe sysctlon all nodes of my cluster. How do I go about it since the cluster is fully deployed and managed by Amazon EKS?

So we managed to achieve this using a custom launch template with our managed node group and then passing in a custom bootstrap script. This does mean however you need to supply the AMI id yourself and lose the alerts in the console when it is outdated. In Terraform this would look like:
resource "aws_eks_node_group" "group" {
...
launch_template {
id = aws_launch_template.nodes.id
version = aws_launch_template.nodes.latest_version
}
...
}
data "template_file" "bootstrap" {
template = file("${path.module}/files/bootstrap.tpl")
vars = {
cluster_name = aws_eks_cluster.cluster.name
cluster_auth_base64 = aws_eks_cluster.cluster.certificate_authority.0.data
endpoint = aws_eks_cluster.cluster.endpoint
}
}
data "aws_ami" "eks_node" {
owners = ["602401143452"]
most_recent = true
filter {
name = "name"
values = ["amazon-eks-node-1.21-v20211008"]
}
}
resource "aws_launch_template" "nodes" {
...
image_id = data.aws_ami.eks_node.id
user_data = base64encode(data.template_file.bootstrap.rendered)
...
}
Then the bootstrap.hcl file looks like this:
#!/bin/bash
set -o xtrace
systemctl stop kubelet
/etc/eks/bootstrap.sh '${cluster_name}' \
--b64-cluster-ca '${cluster_auth_base64}' \
--apiserver-endpoint '${endpoint}' \
--kubelet-extra-args '"--allowed-unsafe-sysctls=net.unix.max_dgram_qlen"'
The next step is to set up the PodSecurityPolicy, ClusterRole and RoleBinding in your cluster so you can use the securityContext as you described above and then pods in that namespace will be able to run without a SysctlForbidden message.
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: sysctl
spec:
allowPrivilegeEscalation: false
allowedUnsafeSysctls:
- net.unix.max_dgram_qlen
defaultAllowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: allow-sysctl
rules:
- apiGroups:
- policy
resourceNames:
- sysctl
resources:
- podsecuritypolicies
verbs:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: allow-sysctl
namespace: app-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: allow-sysctl
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts:app-namespace
If using the DataDog Helm chart you can set the following values to update the securityContext of the agent. But you will have to update the chart PSP manually to set allowedUnsafeSysctls
datadog:
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen"
value: 512"

Kubernetes service usage for NLB from different AWS account

I have a requirement to add kubernetes Service with an ExternalName pointing to NLB(in a different AWS account).
I am using terraform to implement this.
I am not sure how to use NLB info external name section.
Can someone please help?
resource "kubernetes_service" "test_svc" {
metadata {
name = "test"
namespace = var.environment
labels = {
app = "test"
}
}
spec {
type = "ExternalName"
**external_name =**
}
}

Usage of external name is as follows:
apiVersion: v1
kind: Service
metadata:
name: my-service
namespace: prod
spec:
type: ExternalName
externalName: my.database.example.com
Try to put the NLB CNAME as the external name

How to get cluster subdomain in kubernetes deployment config template

On kubernetes 1.6.1 (Openshift 3.6 CP) I'm trying to get the subdomain of my cluster using $(OPENSHIFT_MASTER_DEFAULT_SUBDOMAIN) but it's not dereferencing at runtime. Not sure what I'm doing wrong, docs show this is how environment parameters should be acquired.
https://v1-6.docs.kubernetes.io/docs/api-reference/v1.6/#container-v1-core
- apiVersion: v1
kind: DeploymentConfig
spec:
template:
metadata:
labels:
deploymentconfig: ${APP_NAME}
name: ${APP_NAME}
spec:
containers:
- name: myapp
env:
- name: CLOUD_CLUSTER_SUBDOMAIN
value: $(OPENSHIFT_MASTER_DEFAULT_SUBDOMAIN)

You'll need to set that value as an environment variable, this is the usage:
oc set env <object-selection> KEY_1=VAL_1
for example if your pod is named foo and your subdomain is foo.bar, you would use this command:
oc set env dc/foo OPENSHIFT_MASTER_DEFAULT_SUBDOMAIN=foo.bar

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js