Kubernetes cluster still running even after deleting - amazon-web-services

I created a Kubernetes cluster using ansible-playbook command below
ansible-playbook kubectl.yaml --extra-vars "kubernetes_api_endpoint=<Path to aws load balancer server>"
Now I have deleted the cluster using command
kubectl config delete-cluster <Name of cluster>
But still EC2 nodes are running, I tried to manually stop them but they start again automatically (expected because they are running in a cluster)
Is there any way by which I can detach the nodes from the cluster or delete the cluster in total?
Kubectl config view shows below message
apiVersion: v1
clusters: []
contexts:
- context:
cluster: ""
user: ""
name: default-context
current-context: default-context
kind: Config
preferences: {}
users:
- name: cc3.k8s.local
user:
token: cc3.k8s.local
This means there is no cluster.
I want to delete the cluster in total and start fresh.

The delete-cluster command does this :
delete-cluster Delete the specified cluster from the kubeconfig
It will only delete the context from your ~/.kube/config file. Not delete the actual cluster.
You will need to write a different script for that or go into the AWS console and simply delete the nodes.

I just ran into this same problem. You need to delete the autoscaling group that spawns the worker nodes, which for some reason isn't deleted when you delete the EKS cluster.
Open the AWS console (console.aws.amazon.com), navigate to the EC2 dashboard, then scroll down the left pane to "Auto Scaling Groups". Deleting the autoscaling group should stop the worker nodes from endlessly spawning. You may also want to click on "Launch Configurations" and delete the template as well.
HTH!

As #Jason mentioned delete-cluster is not an option for you if you want to delete cluster completely.
It would be better if you put ansible playbook file content which creates cluster, then we can see how it creates cluster on AWS.
Best and easiest option for me, you can create also simple playbook file to delete cluster by changing relevant module's state to absent in playbook.
Or if it uses EKS, then you can configure your aws command line then simply run i.e aws eks delete-cluster --name devel. For more info click
If it uses Kops, then you can run kops delete cluster --name <name> --yes
For more info about Kops CMD click
If you still need help, please put ansible playbook file to question by editing.

Related

Is the update-kubeconfig command a client-only command or does it affect the cluster

I get the following warning/message when I run some k8s related commands
Kubeconfig user entry is using deprecated API version client.authentication.k8s.io/v1alpha1. Run 'aws eks update-kubeconfig' to update
and then I know I should run the command like so:
aws eks update-kubeconfig --name cluster_name --dry-run
I think the potential change will be client-side only and will not cause any change on the server side - the actual cluster. I just wanted some verification of this, or otherwise. Many thanks
Yes, update-kubeconfig does not make any changes to the cluster. It will only update your local .kube/config file with the cluster info. Note that with the --dry-run flag, no change will be made at all - the resulting configuration will just be printed to stdout.

How do I update my EKS cluster after pushing a change to a Pod's ECR image?

I have an EKS cluster running a service. After I've pushed a change to a Pod's ECR how can I get EKS to update the deployment with a new pod? So far I can only think of deleting the pod, forcing EKS to launch a new a new. Is there a better way of achieving this? I would like to have Jenkins force the pods recreation.
I recommend CI/CD here, after building your image, your CD will deploy it and Jenkins can take that role.
If the image tag hasn't changed, you can try with kubectl rollout restart deployment deployment-name (you may need this imagePullPolicy: Always).
If the image tag changes, you can use sed to replace it and run kubectl apply.
In my humble opinion, you should use v1.Jenkins-build-number or v1.merge-request-number ..., don't use latest for the image tag.

Deleting EKS Cluster with eksctl not working properly, requires manual deletion of resources such as ManagedNodeGroups

I'm running a cluster on EKS, and following the tutorial to deploy one using the command eksctl create cluster --name prod --version 1.17 --region eu-west-1 --nodegroup-name standard-workers --node-type t3.medium --nodes 3 --nodes-min 1 --nodes-max 4 --ssh-access --ssh-public-key public-key.pub --managed.
Once I'm done with my tests (mainly installing and then uninstalling helm charts), and i have a clean cluster with no jobs running, i then try to delete it with eksctl delete cluster --name prod, causing these errors.
[ℹ] eksctl version 0.25.0
[ℹ] using region eu-west-1
[ℹ] deleting EKS cluster "test"
[ℹ] deleted 0 Fargate profile(s)
[✔] kubeconfig has been updated
[ℹ] cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
[ℹ] 2 sequential tasks: { delete nodegroup "standard-workers", delete cluster control plane "test" [async] }
[ℹ] will delete stack "eksctl-test-nodegroup-standard-workers"
[ℹ] waiting for stack "eksctl-test-nodegroup-standard-workers" to get deleted
[✖] unexpected status "DELETE_FAILED" while waiting for CloudFormation stack "eksctl-test-nodegroup-standard-workers"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[✖] AWS::CloudFormation::Stack/eksctl-test-nodegroup-standard-workers: DELETE_FAILED – "The following resource(s) failed to delete: [ManagedNodeGroup]. "
[✖] AWS::EKS::Nodegroup/ManagedNodeGroup: DELETE_FAILED – "Nodegroup standard-workers failed to stabilize: [{Code: Ec2SecurityGroupDeletionFailure,Message: DependencyViolation - resource has a dependent object,ResourceIds: [[REDACTED]]}]"
[ℹ] 1 error(s) occurred while deleting cluster with nodegroup(s)
[✖] waiting for CloudFormation stack "eksctl-test-nodegroup-standard-workers": ResourceNotReady: failed waiting for successful resource state
To fix them I had to manually delete AWS VPCs and then ManagednodeGroups, to then delete everything again.
I tried again with the steps above (creating and deleting with the commands provided in the official getting started documentation), but I get the same errors upon deleting.
It seems extremely weird that I have to manually delete resources when doing something like this. Is there a fix for this problem, am i doing something wrong, or is this standard procedure?
All commands are run through the official eksctl cli, and I'm following the official eksctl deployment
If we try to delete the corresponding Security Group to which the Node Group EC2 is attached to, we will find the root cause.
Mostly it will say there is a Network Interface attached.
So the solution is to delete that linked Network Interface manually. Now the Node Group will be deleted without any error.
If you are using Managed Node Groups and public subnets, be sure that you update your subnet settings to map public IPs on launch before April 20 April 22. You can follow the progress of the updates to managed node groups on our GitHub roadmap.
If you want to learn more about networking configurations and IP assignment for EKS clusters, check blog on cluster networking for worker nodes.
Also you can try:
Go to EC2 > Network Interfaces
Sort by VPC, find the interfaces assigned to your VPC
The interface to delete should be the only one that is "available", it should also be the only one assigned to the problematic remote access SG. If more than one interface matches this description, delete them all.
Take a look: eks-managed-node-groups, eksctl-node-group.
Have you tried running the eksctl delete cluster command with the --wait flag?
Without that flag it will output a message that it is deleted but deletion activities are still going on in the background.

How to configure jenkins slave using Amazon ECS plugin?

I have have created two clusters ECS in the same subnetwork, one for jenkins master and other for jenkins slave(empty cluster). I have installed Amazon ECS plugin on jenkins master but I am not able to configure jenkins slave node. I created both clusters using ecs-cli up command and following are my settings for Amazon ECS plugin similar to my cluster. After running this job a task definition is created in ECS but the service and task definition are not created in the cluster.
Name: ecs-jenkins-slave
Amazon ECS Credential: aws_credentials
ECS Region Name: cluster_region
ECS Clutser: cluster_cluster
ECS Agent Template
Label: ecs-jenkins-slave
Docker Image: jenkinsci/jnlp-slave
Subnet: cluster_subnet
security_group: cluster_sg
(rest of the fields are default)
I created a test job to verify configuration and under Restrict where this project can be run of my test job, I am getting Label ecs-jenkins-slave is serviced by no nodes and 1 cloud. Permissions or other restrictions provided by plugins may prevent this job from running on those nodes message. When I am running the job, it is going in pending state with message '(pending—‘Jenkins’ doesn’t have label ‘ecs-jenkins-slave’) '

How to setup Kubernetes Master HA on AWS

What I am trying to do:
I have setup kubernete cluster using documentation available on Kubernetes website (http_kubernetes.io/v1.1/docs/getting-started-guides/aws.html). Using kube-up.sh, i was able to bring kubernete cluster up with 1 master and 3 minions (as highlighted in blue rectangle in the diagram below). From the documentation as far as i know we can add minions as and when required, So from my point of view k8s master instance is single point of failure when it comes to high availability.
Kubernetes Master HA on AWS
So I am trying to setup HA k8s master layer with the three master nodes as shown above in the diagram. For accomplishing this I am following kubernetes high availability cluster guide, http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
What I have done:
Setup k8s cluster using kube-up.sh and provider aws (master1 and minion1, minion2, and minion3)
Setup two fresh master instance’s (master2 and master3)
I then started configuring etcd cluster on master1, master 2 and master 3 by following below mentioned link:
http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
So in short i have copied etcd.yaml from the kubernetes website (http_kubernetes.io/v1.1/docs/admin/high-availability/etcd.yaml) and updated Node_IP, Node_Name and Discovery Token on all the three nodes as shown below.
NODE_NAME NODE_IP DISCOVERY_TOKEN
Master1
172.20.3.150 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master2
172.20.3.200 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master3
172.20.3.250 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
And on running etcdctl member list on all the three nodes, I am getting:
$ docker exec <container-id> etcdctl member list
ce2a822cea30bfca: name=default peerURLs=http_localhost:2380,http_localhost:7001 clientURLs=http_127.0.0.1:4001
As per documentation we need to keep etcd.yaml in /etc/kubernete/manifest, this directory already contains etcd.manifest and etcd-event.manifest files. For testing I modified etcd.manifest file with etcd parameters.
After making above changes I forcefully terminated docker container, container was existing after few seconds and I was getting below mentioned error on running kubectl get nodes:
error: couldn't read version from server: Get httplocalhost:8080/api: dial tcp 127.0.0.1:8080: connection refused
So please kindly suggest how can I setup k8s master highly available setup on AWS.
To configure an HA master, you should follow the High Availability Kubernetes Cluster document, in particular making sure you have replicated storage across failure domains and a load balancer in front of your replicated apiservers.
Setting up HA controllers for kubernetes is not trivial and I can't provide all the details here but I'll outline what was successful for me.
Use kube-aws to set up a single-controller cluster: https://coreos.com/kubernetes/docs/latest/kubernetes-on-aws.html. This will create CloudFormation stack templates and cloud-config templates that you can use as a starting point.
Go the AWS CloudFormation Management Console, click the "Template" tab and copy out the complete stack configuration. Alternatively, use $ kube-aws up --export to generate the cloudformation stack file.
User the userdata cloud-config templates generated by kube-aws and replace the variables with actual values. This guide will help you determine what those values should be: https://coreos.com/kubernetes/docs/latest/getting-started.html. In my case I ended up with four cloud-configs:
cloud-config-controller-0
cloud-config-controller-1
cloud-config-controller-2
cloud-config-worker
Validate your new cloud-configs here: https://coreos.com/validate/
Insert your cloud-configs into the CloudFormation stack config. First compress and encode your cloud config:
$ gzip -k cloud-config-controller-0
$ cat cloud-config-controller-0.gz | base64 > cloud-config-controller-0.enc
Now copy the content into your encoded cloud-config into the CloudFormation config. Look for the UserData key for the appropriate InstanceController. (I added additional InstanceController objects for the additional controllers.)
Update the stack at the AWS CloudFormation Management Console using your newly created CloudFormation config.
You will also need to generate TLS asssets: https://coreos.com/kubernetes/docs/latest/openssl.html. These assets will have to be compressed and encoded (same gzip and base64 as above), then inserted into your userdata cloud-configs.
When debugging on the server, journalctl is your friend:
$ journalctl -u oem-cloudinit # to debug problems with your cloud-config
$ journalctl -u etcd2
$ journalctl -u kubelet
Hope that helps.
There is also kops project
From the project README:
Operate HA Kubernetes the Kubernetes Way
also:
We like to think of it as kubectl for clusters
Download the latest release, e.g.:
cd ~/opt
wget https://github.com/kubernetes/kops/releases/download/v1.4.1/kops-linux-amd64
mv kops-linux-amd64 kops
chmod +x kops
ln -s ~/opt/kops ~/bin/kops
See kops usage, especially:
kops create cluster
kops update cluster
Assuming you already have s3://my-kops bucket and kops.example.com hosted zone.
Create configuration:
kops create cluster --state=s3://my-kops --cloud=aws \
--name=kops.example.com \
--dns-zone=kops.example.com \
--ssh-public-key=~/.ssh/my_rsa.pub \
--master-size=t2.medium \
--master-zones=eu-west-1a,eu-west-1b,eu-west-1c \
--network-cidr=10.0.0.0/22 \
--node-count=3 \
--node-size=t2.micro \
--zones=eu-west-1a,eu-west-1b,eu-west-1c
Edit configuration:
kops edit cluster --state=s3://my-kops
Export terraform scripts:
kops update cluster --state=s3://my-kops --name=kops.example.com --target=terraform
Apply changes directly:
kops update cluster --state=s3://my-kops --name=kops.example.com --yes
List cluster:
kops get cluster --state s3://my-kops
Delete cluster:
kops delete cluster --state s3://my-kops --name=kops.identityservice.co.uk --yes