Unable to validate Kubernetes cluster using Kops - amazon-web-services

I am new to Kubernetes. I am using Kops to deploy my Kubernetes application on AWS. I have already registered my domain on AWS and also created a hosted zone and attached it to my default VPC.
Creating my Kubernetes cluster through kops succeeds. However, when I try to validate my cluster using kops validate cluster, it fails with the following error:
unable to resolve Kubernetes cluster API URL dns: lookup api.ucla.dt-api-k8s.com on 149.142.35.46:53: no such host
I have tried debugging this error but failed. Can you please help me out? I am very frustrated now.

From what you describe, you created a Private Hosted Zone in Route 53. The validation is probably failing because Kops is trying to access the cluster API from your machine, which is outside the VPC, but private hosted zones only respond to requests coming from within the VPC. Specifically, the hostname api.ucla.dt-api-k8s.com is where the Kubernetes API lives, and is the means by which you can communicate and issue commands to the cluster from your computer. Private Hosted Zones wouldn't allow you to access this API from the outside world (your computer).
A way to resolve this is to make your hosted zone public. Kops will automatically create a VPC for you (unless configured otherwise), but you can still access the API from your computer.

I encountered this last night using a kops-based cluster creation script that had worked previously. I thought maybe switching regions would help, but it didn't. This morning it is working again. This feels like an intermittency on the AWS side.
So the answer I'm suggesting is:
When this happens, you may need to give it a few hours to resolve itself. In my case, I rebuilt the cluster from scratch after waiting overnight. I don't know whether or not it was necessary to start from scratch -- I hope not.

This is all I had to run:
kops export kubecfg (cluster name) --admin
This imports the "new" kubeconfig needed to access the kops cluster.

I came across this problem with an ubuntu box. What I did was to add the dns record in the hosted zone in route 53 to /etc/hosts.

Here is how I resolved the issue :
Looks like there is a bug with kops library though it shows
**Validation failed: unexpected error during validation: unable to resolve Kubernetes cluster API URL dns: lookup api **
when u try kops validate cluster post waiting for 10-15 mins. Behind the scene the kubernetes cluster is up ! You can verify same by doing ssh in to master node of your kunernetes cluster as below
Go to page where u can ec2 instance and your k8's instances running
copy "Public IPv4 address" of your master k8 node
post login to ec2 instance on command prompt login to master node as below
ssh ubuntu#<<"Public IPv4 address" of your master k8 node>>
Verify if you can see all node of k8 cluster with below command it should show your master node and worker node listed there
kubectl get nodes --all-namespaces

Related

AWS Loadbalancer is not accessible

I have a solution (AnzoGraph DB) deployed on my AWS Kubernetes Cluster (EC2 Instance), and was working totally fine.
Suddenly this solution stopped and i could not access it via the DNS anymore.
I tested the solution deployed on my cluster using kubectl port-forward command and they are working fine (the pods and services), thus i assume the problem is with AWS Loadbalancer.
To access the application we need to go through this path:
Request -> DNS -> AWS Load Balancer -> Services -> Pods.
The LoadBalancer is (classic) internal, so it's only accessible for me or the company using VPN.
Every time when I try to access the DNS , I got no response.
Any idea how i can fix it ? or where is the exact issue ? how can I troubleshoot this issue and follow the traffic on AWS ?
Thanks a lot for the help!
sorry I missed your post earlier.
lets start with a few questions...
You say you use k8s on AWS EC2, do you actually use EKS, or do you run a different k8s stack?
Also ... you mentioned that you access the LB from your (DB) client/ your software by DNS resolving the LB and then access AnzoGraph DB.
I want to make sure that the solution is actually DNS resolving the LB via DNS every time. if you have a long running service, and AWS changes the IP address of the LB, and your SW possibly had cached the IP, you would not be able to connect to the LB.
on the system you run your Software accessing AnzoGraph DB ... (I assume CentOS (7) )
make sure you have dig installed (yum install bind-utils)
dig {{ your DNS name of your LB }}
is that actually the IP address your SW is accessing?
has the IP address of the client changed? make sure the LB SG allows access
(https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-security-groups.html)
I assume you access the AnzoGraph DB frontend POD via 443?
as you write
"I tested the solution deployed on my cluster using kubectl port-forward command and they are working fine (the pods and services)"
we probably do not have to look for pod logs.
(if that was not the case, the LB would obviously block traffic as well.)
So I agree, that the most likely issue is (bad) DNS caching or SG due to different SRC IP being rejected by the classic LB SG.
also for completeness .. please tell us more about your env.
AnzoGraph DB image
EKS/k8s version
helm chart / AnzoGraph operator used.
Best - Frank

kOps 1.19 reports error "Unauthorized" when interfacing with AWS cluster

I'm following the kOps tutorial to set up a cluster on AWS. I am able to create a cluster with
kops create cluster
kops update cluster --yes
However, when validating whether my cluster is set up correctly with
kops validate cluster
I get stuck with error:
unexpected error during validation: error listing nodes: Unauthorized
The same error happens in many other kOps operations.
I checked my kOps/K8s version and it is 1.19:
> kops version
Version 1.19.1 (git-8589b4d157a9cb05c54e320c77b0724c4dd094b2)
> kubectl version
Client Version: version.Info{Major:"1", Minor:"20" ...
Server Version: version.Info{Major:"1", Minor:"19" ...
How can I fix this?
As of kOps 1.19 there are two reasons you will suddenly get this error:
If you delete a cluster and reprovision it, your old admin is not removed from the kubeconfig and kOps/kubectl tries to reuse it.
New certificates have a TTL of 18h by default, so you need to reprovision them about once a day.
Both issues above are fixed by running kops export kubecfg --admin.
Note that using the default TLS credentials is discouraged. Consider things like using an OIDC provider instead.
Kubernetes v1.19 removed basic auth support, incidentally making the default kOps credentials unable to authorize. To work around this, we will update our cluster to use a Network Load Balancer (NLB) instead of the default Classic Load Balancer (CLB). The NLB can be accessed with non-deprecated AuthZ mechanisms.
After creating your cluster, but before updating cloud resources (before running with --yes), edit its configuration to use a NLB:
kops edit cluster
Then update your load balancer class to Network:
spec:
api:
loadBalancer:
class: Network
Now update cloud resources with
kops update cluster --yes
And you'll be able to pass AuthZ with kOps on your cluster.
Note that there are several other advantages to using an NLB as well, check the AWS docs for a comparison.
If you have a pre-existing cluster you want to update to a NLB, there are more steps to follow to ensure clients don't start failing AuthZ, to delete old resources, etc. You'll find a better guide for that in the kOps v1.19 release notes.

Accessing ElastiCache from EKS Cluster

I have set up an EKS cluster and I am trying to connect application pod to ElastiCache endpoint. I put both in same VPC and configured in/out security groups for them. Unfortunately while trying to telnet from pod to cache endpoint, it says "xxx.yyy.zzz.amazonaws.com: Unknown host". Is it even possible to make such a connection?
Yes, if the security groups allow connectivity then you can connect from EKS pods to Elasticache. However, be aware that the DNS name may not resolve for some time (up to around 15 minutes) after you launch the Elasticache instance/cluster.
I found an answer in the issue from cortexproject(monitoring tool based on Grafana stack).
Solved it using "addresses" instead "host" with the address of my memcached. It worked.
PS: "addresses" option isn't documented in the official documentation.
It has to view like this:
memcached_client:
addresses: memcached.host

kubeadm init phase upload-config failing

I am new to kubernetes and want to setup kubernetes HA setup after successfully completing examples with minikube and single master kubernetes cluster. I am using AWS EC2 instances and AWS application load balancer for this purpose. I dont want to use KOPS or any other tool for installation. I want to get hands on with kubeadm.
I followed below steps
Created self signed certificate ca.crt and ca.key to use for
kubernetes
Installed this certificate as Root CA on my ubuntu
instance
Copied this ca.crt and ca.key to /etc/kubernetes/pki
Created new certificate for aws loadbalancer and signed
it with above ca.crt. With this certificate created Aws application
load balancer
I also created Record Set in AWS Route53 for domain name mapping. I also made sure this domain name
mapping is working. (i.e. master.k8sonaws.com is properly resolving to aws load balancer)
Now I am using kubeadm init
kubeadm init --pod-network-cidr=192.168.0.0/20
--service-cidr=192.168.16.0/20 --node-name=10.0.0.13 --control-plane-endpoint "master.k8sonaws.com:443" --upload-certs --v=8 --apiserver-bind-port=443 --apiserver-cert-extra-sans=master.k8sonaws.com,i-0836dd4dc6609a924
This command is succeeding up-to upload-config phase. Health check endpoint is returning success but
after that its failing in upload-config phase
configmaps is forbidden: User "system:anonymous" cannot create resource "configmaps" in API group "" in the namespace
"kube-system"
Here I am not able to understand why kubeadm is passing anonymyous user for api call. How can I resolve this issue
The certificate in the kubeconfig file used to create the ConfigMap does not have right groups. I would say don't generate ca and cert for kubernetes yourself. Just use kubeadm init and kubeadm will handle the ca and cert generation. After your kubernetes cluster is up and running you can use the same ca to generate a cert yourself and use that in aws load balancer.
I found solution to this problem after trying lot of different things
for 2 days. Problem is aws load balancer does not pass client
certificate to backend server when using https listener. And problem is Aws has
not documented this fact or I did not get those document if they are
there.
Way to solve this problem is using plane http listener using same port 443. So SSL termination is carried out by backened server. In my case this is not security threat since both my load balancer and backend servers are internal and not exposed to public ip.

Getting error for The subdomain does not map to a valid identity zone, understanding how to define such zones on K8s

Now I have a plain K8s cluster (1.15.0) running on a set of VMs on VMWare, it is not in the cloud so I don't have a domain registered there.
what i did after installing the K8s cluster, I mapped a virtual domain in hosts file of all the vms to map to one single worker node and then I started installing uaa using such domain the public ip assigned to my single worker node (mydoman.com mypublicip).
when i installed uaa all pods are started up properly but it is not really healthy as it is giving that the readiness prob fail as it is not able to access the domain https://uaa.${DOMAIN}:8443/info
trying the same url using my local worker node ip or the public ip i am getting the following error
The subdomain does not map to a valid identity zone.
So Can someone please explain what am I missing?