How to recover Kubernetes cluster with KOPS? - amazon-web-services

We were trying to upgrade the Kops version of the Kubernetes Cluster. We have followed the below steps for that;
Download the latest KOPS version 1.24 (the old version is 1.20)
Do the template changes according to 1.24
Set ENV variables
export KUBECONFIG="<<Kubeconfig file>>"
export AWS_PROFILE="<< AWS PROFILE NAME >>"
export AWS_DEFAULT_REGION="<< AWS Region >>"
export KOPS_STATE_STORE="<< AWS S3 Bucket Name >>"
export NAME="<< KOPS Cluster Name >>"
kops get $NAME -o yaml > existing-cluster.yaml
kops toolbox template --template templates/tm-eck-mixed-instances.yaml --values values_files/values-us-east-1.yaml --snippets snippets --output cluster.yaml --name $NAME
kops replace -f cluster.yaml
kops update cluster --name $NAME
kops rolling-update cluster --name $NAME --instance-group=master-us-east-1a --yes --cloudonly
Once the master is rolled over I noticed that this master is not joined to the cluster.
After a few rounds of troubleshooting, I found the below error in the API server.
I0926 09:54:41.220817 1 flags.go:59] FLAG: --vmodule=""
I0926 09:54:41.223834 1 dynamic_serving_content.go:111] Loaded a new cert/key pair for "serving-cert::/srv/kubernetes/kube-controller-manager/server.crt::/srv/kubernetes/kube-controller-manager/server.key"
unable to load configmap based request-header-client-ca-file: Get "https://127.0.0.1/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 127.0.0.1:443: connect: connection refused
I have tried to resolve this issue and couldn't find a way, SO decided to roll back using a backup. These are the steps I've followed for that;
kops replace -f cluster.yaml
kops update cluster --name $NAME
kops rolling-update cluster --name $NAME --instance-group=master-us-east-1a --yes --cloudonly
Still, I'm getting the same error in the Master node.
Does anyone know how I can restore the cluster using Kops ??

After a few rounds of troubleshooting, I've found that whenever we deploy a new version using kops it's creating a new version in the launch template in AWS. I have manually changed the launch template version used in the Auto scaling group of all node groups. Then cluster is rollbacked to the previous state and starts working properly. Then I reran the upgrade process after adding the missing configurations into the kops template file.

Related

ecs-cli refers to old cluster after changing default profile; doesn't show EC2 instances

I've been using AWS's ECS CLI to spin clusters of EC2 instances up and down for various tasks. The problem I'm running into is that it seems to be referring to old information that I don't know how to change.
e.g., I just created a cluster, my-second-cluster successfully, and can see it in the AWS console:
$ ecs-cli up --keypair "my-keypair" --capability-iam --size 4 --instance-type t2.micro --port 22 --cluster-config my-second-cluster --ecs-profile a-second-profile
INFO[0001] Using recommended Amazon Linux 2 AMI with ECS Agent 1.45.0 and Docker version 19.03.6-ce
INFO[0001] Created cluster cluster=my-second-cluster region=us-east-1
INFO[0002] Waiting for your cluster resources to be created...
INFO[0002] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
INFO[0063] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
INFO[0124] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
VPC created: vpc-123abc
Security Group created: sg-123abc
Subnet created: subnet-123abc
Subnet created: subnet-123def
Cluster creation succeeded.
...but eci-cli ps returns an error referring to an old cluster:
$ ecs-cli ps
FATA[0000] Error executing 'ps': Cluster 'my-first-cluster' is not active. Ensure that it exists
Specifying the cluster explicitly (ecs-cli ps --cluster my-second-cluster --region us-east-1) returns nothing, even though I see the 4 EC2 instances when I log into the AWS console.
Supporting details:
Before creating this second cluster, I created a second profile and set it to the default. I also set the new cluster to be the default.
$ ecs-cli configure profile --access-key <MY_ACCESS_KEY> --secret-key <MY_SECRET_KEY> --profile-name a-second-profile
$ ecs-cli configure profile default --profile-name a-second-profile
$ ecs-cli configure --cluster my-second-cluster --region us-east-1
INFO[0000] Saved ECS CLI cluster configuration default.
It's unclear to me where these ECS profile and cluster configs are stored (I'd expect to see them as files in ~/.aws, but no), or how to manipulate them beyond the cli commands that don't give great feedback. Any ideas on what I'm missing?
The ECS CLI stores it's credentials at ~/.ecs/credentials.
When you create the initial profile it's name is default and is used by default. When you set a-second-profile to default, it sets the metadata to use a-second-profile by default but you still have a profile named default that points to the original creds.
My guess is that to see the first cluster you need to now specify a profile name since you changed the default. If you didn't give your initial profile a name then it will be default.
ecs-cli ps --ecs-profile default
If you deleted your cluster configuration you may need to add the cluster again and associate to the right profile:
ecs-cli configure --cluster cluster_name --default-launch-type launch_type --region region_name --config-name configuration_name
I hope that makes sense. Hopefully looking at how your commands update ~/.ecs/credentials be helpful.
Some resources:
ECS CLI Configurations

Create kubernetes EC2 cluster on aws using kops

I am trying to setup a kubernetes cluster on AWS with EC2 instances; it is supposed to be pretty straight forward.
Initially started with kubeadm and I ran into problems
Setup Kubernetes (version 1.18) cluster on AWS EC2
There was a suggestion to use kops and I started with kops; I am running into problems again
I need to create kops create secret with the ssh public key
The key has been generated and available in /home/ubuntu/.ssh/id_rsa.pub
however I get the following error
ubuntu#ip-10-0-1-8:~$ kops create secret --name newk8.shivag.io sshpublickey ubuntu -i ~/.ssh/id_rsa.pub --state s3://shivag.kube-kops-state --v=3
I0418 21:55:10.880023 19723 factory.go:68] state store s3://shivag.kube-kops-state
I0418 21:55:10.880229 19723 s3context.go:325] unable to read /sys/devices/virtual/dmi/id/product_uuid, assuming not running on EC2: open /sys/devices/virtual/dmi/id/product_uuid: permission denied
I0418 21:55:10.880303 19723 s3context.go:170] defaulting region to "us-east-1"
I0418 21:55:11.281113 19723 s3context.go:210] found bucket in region "eu-west-1"
error reading SSH public key /home/ubuntu/.ssh/id_rsa.pub: open /home/ubuntu/.ssh/id_rsa.pub: permission denied
Any help will be appreciated
I used snap to install kops and this installed the 1.17 beta version
I removed this and installed 1.16.0 and everything was smooth
I have the Full instructions to install Kubernetes Cluster

Import manually created K8s cluster into KOps

It's been sometime I've visited all the web pages carrying word "KOps import" but did not find a way to import my manually created K8s cluster. Manually created cluster means "Deployed Infra on AWS using Terraform and Kubernetes using Terraform's provisioner script as Shell script". Now as I see managing the environment manually is a pain, I look forward to move it under KOps. For that I have done the following so far:
Installed aws cli, kubectl and kops in my local machine.
Created KOps user with policies AmazonEC2FullAccess,
AmazonRoute53FullAccess, AmazonS3FullAccess, IAMFullAccess,
AmazonVPCFullAccess and generated access and secret keys.
Configured credentials using aws configure.
Created S3 bucket to store state.
Set env variables like Region and Cluster name.
Finally, ran kops import command as below:
kops import cluster --region ${REGION} --name ${OLD_NAME}
But encountered below error:
Cluster.kops "jjm-prod-use1-kubernetes" not found
Verbosed:
$ kops import cluster --region ${REGION} --name ${OLD_NAME} -v 10
I0131 16:32:12.059651 25683 factory.go:68] state store s3://kops-state-store-jjm
I0131 16:32:13.133145 25683 s3context.go:194] found bucket in region "us-east-1"
I0131 16:32:13.133174 25683 s3fs.go:220] Reading file "s3://kops-state-store-jjm/jjm-prod-use1-kubernetes/config"
Which made me serious about posting this question. Is there any possible way where a K8s cluster created except using kubeup.sh can be brought under the control of KOps ? Please advise.
Note: There's no way I can re-create (destroy and create) the clusters as they are running in production.
EDIT: I know this can be achieved only the cluster was setup using kubeup.sh. But is there any other way ?
That is only possible with cluster bootstrapped via kube-up.sh script as officialy announced in Kops documentation pages. Actually, kube-up.sh has been excluded from the list of supported Kubernetes installation tools for AWS. Although, cluster composed by kube-up.sh provides a lot of customization settings which are specifically applicable to AWS, the initial script uses environmental variables to define these settings. Therefore, I assume that it's quite hard to achieve in your case.

kops - get wrong kubectl context

I use kops create kubernetes cluster in aws.
I want to validate the cluster using this command:
kops validate cluster
The stdout give me: Using cluster from kubectl context: minikube
I think the problem is the wrong context, but why I kops does not create context for me?
This is my contexts:
kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* minikube minikube minikube
there is no aws kubernetes cluster context.
How do I solve this?
Works like charm
kops export kubecfg --name=clustername.com
kops has set your kubectl context to k9s.finddeepak.com
kops helps you to create, destroy, upgrade and maintain production-grade, highly available Kubernetes clusters from the command line. AWS (Amazon Web Services) is currently officially supported, with GCE in beta support , and VMware vSphere in alpha, and other platforms planned.
Your actual configuration uses minikube config file from the previous installation. And it is fine. It’s useful to have a few
clusters in one config and switch between them.
The extended configuration will be saved into a ~/.kube/config file, you may try:
kops export kubeconfig ${CLUSTER_NAME}

How to setup Kubernetes Master HA on AWS

What I am trying to do:
I have setup kubernete cluster using documentation available on Kubernetes website (http_kubernetes.io/v1.1/docs/getting-started-guides/aws.html). Using kube-up.sh, i was able to bring kubernete cluster up with 1 master and 3 minions (as highlighted in blue rectangle in the diagram below). From the documentation as far as i know we can add minions as and when required, So from my point of view k8s master instance is single point of failure when it comes to high availability.
Kubernetes Master HA on AWS
So I am trying to setup HA k8s master layer with the three master nodes as shown above in the diagram. For accomplishing this I am following kubernetes high availability cluster guide, http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
What I have done:
Setup k8s cluster using kube-up.sh and provider aws (master1 and minion1, minion2, and minion3)
Setup two fresh master instance’s (master2 and master3)
I then started configuring etcd cluster on master1, master 2 and master 3 by following below mentioned link:
http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
So in short i have copied etcd.yaml from the kubernetes website (http_kubernetes.io/v1.1/docs/admin/high-availability/etcd.yaml) and updated Node_IP, Node_Name and Discovery Token on all the three nodes as shown below.
NODE_NAME NODE_IP DISCOVERY_TOKEN
Master1
172.20.3.150 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master2
172.20.3.200 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master3
172.20.3.250 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
And on running etcdctl member list on all the three nodes, I am getting:
$ docker exec <container-id> etcdctl member list
ce2a822cea30bfca: name=default peerURLs=http_localhost:2380,http_localhost:7001 clientURLs=http_127.0.0.1:4001
As per documentation we need to keep etcd.yaml in /etc/kubernete/manifest, this directory already contains etcd.manifest and etcd-event.manifest files. For testing I modified etcd.manifest file with etcd parameters.
After making above changes I forcefully terminated docker container, container was existing after few seconds and I was getting below mentioned error on running kubectl get nodes:
error: couldn't read version from server: Get httplocalhost:8080/api: dial tcp 127.0.0.1:8080: connection refused
So please kindly suggest how can I setup k8s master highly available setup on AWS.
To configure an HA master, you should follow the High Availability Kubernetes Cluster document, in particular making sure you have replicated storage across failure domains and a load balancer in front of your replicated apiservers.
Setting up HA controllers for kubernetes is not trivial and I can't provide all the details here but I'll outline what was successful for me.
Use kube-aws to set up a single-controller cluster: https://coreos.com/kubernetes/docs/latest/kubernetes-on-aws.html. This will create CloudFormation stack templates and cloud-config templates that you can use as a starting point.
Go the AWS CloudFormation Management Console, click the "Template" tab and copy out the complete stack configuration. Alternatively, use $ kube-aws up --export to generate the cloudformation stack file.
User the userdata cloud-config templates generated by kube-aws and replace the variables with actual values. This guide will help you determine what those values should be: https://coreos.com/kubernetes/docs/latest/getting-started.html. In my case I ended up with four cloud-configs:
cloud-config-controller-0
cloud-config-controller-1
cloud-config-controller-2
cloud-config-worker
Validate your new cloud-configs here: https://coreos.com/validate/
Insert your cloud-configs into the CloudFormation stack config. First compress and encode your cloud config:
$ gzip -k cloud-config-controller-0
$ cat cloud-config-controller-0.gz | base64 > cloud-config-controller-0.enc
Now copy the content into your encoded cloud-config into the CloudFormation config. Look for the UserData key for the appropriate InstanceController. (I added additional InstanceController objects for the additional controllers.)
Update the stack at the AWS CloudFormation Management Console using your newly created CloudFormation config.
You will also need to generate TLS asssets: https://coreos.com/kubernetes/docs/latest/openssl.html. These assets will have to be compressed and encoded (same gzip and base64 as above), then inserted into your userdata cloud-configs.
When debugging on the server, journalctl is your friend:
$ journalctl -u oem-cloudinit # to debug problems with your cloud-config
$ journalctl -u etcd2
$ journalctl -u kubelet
Hope that helps.
There is also kops project
From the project README:
Operate HA Kubernetes the Kubernetes Way
also:
We like to think of it as kubectl for clusters
Download the latest release, e.g.:
cd ~/opt
wget https://github.com/kubernetes/kops/releases/download/v1.4.1/kops-linux-amd64
mv kops-linux-amd64 kops
chmod +x kops
ln -s ~/opt/kops ~/bin/kops
See kops usage, especially:
kops create cluster
kops update cluster
Assuming you already have s3://my-kops bucket and kops.example.com hosted zone.
Create configuration:
kops create cluster --state=s3://my-kops --cloud=aws \
--name=kops.example.com \
--dns-zone=kops.example.com \
--ssh-public-key=~/.ssh/my_rsa.pub \
--master-size=t2.medium \
--master-zones=eu-west-1a,eu-west-1b,eu-west-1c \
--network-cidr=10.0.0.0/22 \
--node-count=3 \
--node-size=t2.micro \
--zones=eu-west-1a,eu-west-1b,eu-west-1c
Edit configuration:
kops edit cluster --state=s3://my-kops
Export terraform scripts:
kops update cluster --state=s3://my-kops --name=kops.example.com --target=terraform
Apply changes directly:
kops update cluster --state=s3://my-kops --name=kops.example.com --yes
List cluster:
kops get cluster --state s3://my-kops
Delete cluster:
kops delete cluster --state s3://my-kops --name=kops.identityservice.co.uk --yes