unable to create NAT Gateway for eks worker nodes - amazon-web-services

I deployed eks-cluster with two nodes in the same subnet.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-xx-xx.xx-xx-xx.compute.internal Ready <none> 6h31m v1.22.9-eks-xxxx
ip-172-31-xx-xx.xx-xxx-x.compute.internal Ready <none> 6h31m v1.22.9-eks-xxxx
Everything worked fine. I wanted to configure a NAT-gateway for the subnet in which nodes are present.
Once the NAT-gateway is configured all of a sudden all the nodes went to NotReady state.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-xx-xx.xx-xx-xx.compute.internal NotReady <none> 6h45m v1.22.9-eks-xxxx
ip-172-31-xx-xx.xx-xxx-x.compute.internal NotReady <none> 6h45m v1.22.9-eks-xxxx
kubectl get events also show that the nodes are NotReady. I am not able to exec into the pod as well.
when i try kubectl exec i get error: unable to upgrade connection: Unauthorized.
Upon removing my subnet from the associate-subnets in route-table(as part of creating the nat-gateway) everything worked fine and nodes went into ready state.
Any idea as to how to create NAT gateway for eks worker nodes? Is there anything i am missing
Thanks in advance

I used eksctl to deploy the cluster using the following command
eksctl create cluster
--name test-cluster \
--version 1.22 \
--nodegroup-name test-kube-workers \
--node-type t3.medium \
--nodes 2 \
--nodes-min 1 \
--nodes-max 2 \
--node-private-networking \
--ssh-access
and everything has been taken care off

Related

EKS cluster upgrade fail with Kubelet version of Fargate pods must be updated to match cluster version

I have an EKS cluster v1.23 with Fargate nodes. Cluster and Nodes are in v1.23.x
$ kubectl version --short
Server Version: v1.23.14-eks-ffeb93d
Fargate nodes are also in v1.23.14
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fargate-ip-x-x-x-x.region.compute.internal Ready <none> 7m30s v1.23.14-eks-a1bebd3
fargate-ip-x-x-x-xx.region.compute.internal Ready <none> 7m11s v1.23.14-eks-a1bebd3
When I tried to upgrade cluster to 1.24 from AWS console, it gives this error.
Kubelet version of Fargate pods must be updated to match cluster version 1.23 before updating cluster version; Please recycle all offending pod replicas
What are the other things I have to check?
Fargate nodes are also in v1.23.14
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fargate-ip-x-x-x-x.region.compute.internal Ready <none> 7m30s v1.23.14-eks-a1bebd3
fargate-ip-x-x-x-xx.region.compute.internal Ready <none> 7m11s v1.23.14-eks-a1bebd3
From your question you only have 2 nodes, likely you are running only the coredns. Try kubectl scale deployment coredns --namespace kube-system --replicas 0 then upgrade. You can scale it back to 2 when the control plane upgrade is completed. Nevertheless, ensure you have selected the correct cluster on the console.

EKS Connector Pods stuck in Init:CrashLoopBackOff

I have a single node kubernetes cluster setup on AWS,I am currently running a VPC with one public and private subnet.
The master node is in the public subnet and worker node is in the private subnet.
So on the AWS console I can succesfuly register a cluster and download the connector manifest which, I then download and apply the manifest on my master node but unfortunately the pods don't start. the below is what i observered.
kubectl get pods
NAME               READY      STATUS              RESTARTS            AGE
eks-connector-0   0/2  Init:CrashLoopBackOff   7 (4m36s ago)       19m
kubectl logs ejs-connector-0
Defaulted container "connector-agent" out of: connector-agent, connector-proxy, connector-init (init)
Error from server (BadRequest): container "connector-agent" in pod "eks-connector-0" is waiting to start: PodInitializing
The pods are failing to start with th above logged errors.
I would suggest providing output of kubectl get pod eks-connector-0 -o yaml and kubectl logs -p eks-connector-0

Unable to install helm chart on private Google Kubernetes Engine (timeout error)

I have created a private cluster in gke with the follwoing
gcloud container clusters create private-cluster-0 \
--create-subnetwork name=my-subnet-0 \
--enable-master-authorized-networks \
--enable-ip-alias \
--enable-private-nodes \
--enable-private-endpoint \
--master-ipv4-cidr 172.16.0.32/28 \
--zone us-central1-a
Then I did
gcloud container clusters get-credentials --zone us-central1-a private-cluster-0
I was trying to install a helm chart from my local machine but I got the following error:
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://172.16.0.34/version?timeout=32s": dial tcp 172.16.0.34:443: i/o timeout
Can anyone please tell me how to resolve this error.
How to deploy an helm chart from a local machine to a private cluster in gke?
You created a private cluster and trying to install helm from local machine.
This won't work because 172.16.0.0/12 range is non-routable, your PC is looking for the cluster in your own LAN.
You can find information on accessing private GKE clusters on google docs.
There are also more general tutorials on installing helm on GKE from google and medium.
First you need basic connectivity to access your private cluster.
For example, SSH to a VM on a subnet that master-ipv4-cidr allows.
I had this basic connectivity but was still unable to install a helm chart as the install couldn't access services within the cluster.
I could only see this issue after adding verbosity to helm install and logging the output.
helm install -v10 my-chart >log.txt 2>&1
With the get-credentials command
gcloud container clusters get-credentials --zone us-central1-a private-cluster-0
Try adding the argument --internal-ip
This controls whether to use the internal IP address of the cluster endpoint. It made the difference for me.

Unable to connect Redis Instance from GKE pod - Connection to Redis <IP:6379 >failed after 2 failures.Last Error : (110) Operation timed out

I have GKE cluster that I created with following command:
$ gcloud container clusters create stage1 \
--enable-ip-alias \
--release-channel stable \
--zone us-central1 \
--node-locations us-central1-a,us-central1-b
and I also created a redis instance with following command:
$ gcloud redis instances create redisbox --size=2 --region=us-central1 --redis-version=redis_5_0
I have retrieved the IP address of the redis instance with:
$ gcloud redis instances describe redisbox --region=us-central1
I have updated this IP in my PHP application, built my docker image , created the pod in GKE cluster. When pod is created the container throws following error
Connection to Redis :6379 failed after 2 failures.Last Error : (110) Operation timed out
Note 1: This is working application in hosted environment and we are migrating to Google Cloud
Note 2: GKE and Redis instance is in same region
Note 3: Enabled IP aliasing in cluster
After reproducing this VPC-native GKE cluster and Redis instance with your gcloud commands, I could check that both the nodes and their pods can reach the redisbox host, for example with ncat in a debian:latest pod:
$ REDIS_IP=$(gcloud redis instances describe redisbox --format='get(host)' --region=us-central1)
$ gcloud container clusters get-credentials stage1 --region=us-central1
$ kubectl exec -ti my-debian-pod -- /bin/bash -c "ncat $REDIS_IP 6379 <<<PING"
+PONG
Therefore, I suggest that you try performing this lower-level reachability test in case there there is an issue with the specific request that your PHP application is doing.

Google Cloud Platform - receiving a 502 error for a backend not passing health checks

I have been following this guide to deploy Pega 7.4 on Google Cloud compute engine. Everything went smoothly however on the Load Balancer health check the service continues to be unhealthy.
When visiting the external IP a 502 is returned and in trying to troubleshoot GCP told us to "Make sure that your backend is healthy and supports HTTP/2 protocol". Well in the guide this command:
gcloud compute backend-services create pega-app \
--health-checks=pega-health \
--port-name=pega-web \
--session-affinity=GENERATED_COOKIE \
--protocol=HTTP --global
The protocol is HTTP but is this the same as HTTP/2?
What else could be wrong besides checking that the firewall setup allows the health checker and load balancer to pass through (below)?
gcloud compute firewall-rules create pega-internal \
--description="Pega node to node communication requirements" \
--action=ALLOW \
--rules=tcp:9300-9399,tcp:5701-5800 \
--source-tags=pega-app \
--target-tags=pega-app
gcloud compute firewall-rules create pega-web-external \
--description="Pega external web ports" \
--action=ALLOW \
--rules=tcp:8080,tcp:8443 \
--source-ranges=130.211.0.0/22,35.191.0.0/16 \
--target-tags=pega-app
Edit:
So the Instance group has a named port on 8080
gcloud compute instance-groups managed set-named-ports pega-app \
--named-ports=pega-web:8080 \
--region=${REGION}
And the health check config:
gcloud compute health-checks create http pega-health \
--request-path=/prweb/PRRestService/monitor/pingservice/ping \
--port=8080
I have checked VM Instance logs on the pega-app and getting 404 when trying to hit the ping service.
My problem was that I used a configured using a Static IP address without applying a domain name system record like this: gcloud compute addresses create pega-app --global I skipped this step so it generates ephemeral IP addresses each time the instances have to boot up.