add karpenter to eksctl config file, but no upgrade - amazon-web-services

I have created the EKS cluster.
Then follow the document (https://eksctl.io/usage/eksctl-karpenter/) to add karpenter support,
metadata:
name: eks-dev
region: ap-southeast-2
version: "1.22"
+ tags:
+ karpenter.sh/discovery: eks-dev
+iam:
+ withOIDC: true # required
+karpenter:
+ version: '0.9.0'
managedNodeGroups:
- name: spot
but when I upgrade it, nothing happen.
$ eksctl upgrade cluster -f eks-dev.yaml --approve
2022-06-07 21:08:25 [!] NOTE: cluster VPC (subnets, routing & NAT Gateway) configuration changes are not yet implemented
2022-06-07 21:08:25 [ℹ] no cluster version update required
2022-06-07 21:08:26 [ℹ] re-building cluster stack "eksctl-eks-dev-cluster"
2022-06-07 21:08:26 [✔] all resources in cluster stack "eksctl-eks-dev-cluster" are up-to-date
2022-06-07 21:08:26 [ℹ] checking security group configuration for all nodegroups
2022-06-07 21:08:26 [ℹ] all nodegroups have up-to-date cloudformation templates
$
The note is about to igonre the change for VPC, but Karpenter change is not related to vpc.
So how can I fix this issue?

Support for karpenter only applies to new cluster, it has no effect to existing cluster. You can manually install karpenter on existing cluster following this guide.

Related

AWS comparison between nodegroup and managed nodegroup

I use eksctl to create EKS cluster on AWS
After create a yaml configuration file define EKS cluster follow docs, when I run the command eksctl create cluster -f k8s-dev/k8s-dev.yaml to execute the create cluster action, the log show some lines below:
2021-12-15 16:23:55 [ℹ] will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s)
2021-12-15 16:23:55 [ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
What is the different between nodegroup and managed nodegroup?
I have read from official docs from AWS about managed nodegroup but I'm still can not clearly which exactly reason to choose nodegroup or managed nodegroup?
What would you use when you need to create a EKS cluster?
eksctl only provide option for you to choose nodeGroups or managedNodeGroups docs: https://eksctl.io/usage/container-runtime/#managed-nodes but not describe the different. But I think the follow document will give you the information you need
It describe the different features between EKS managed node groups - Self managed nodes and AWS Fargate
https://docs.aws.amazon.com/eks/latest/userguide/eks-compute.html
Depend on which purpose you want to use to choose the match one with your purpose, and if I was you, I will choose managed nodegroup.

unable to get nodegroup info using eskctl

Total noob and have a runaway EKS cluster adding up $$ on AWS.
I'm having a tough time scaling down my cluster ad not sure what to do. I'm following the recommendations here: How to stop AWS EKS Worker Instances reference below
If I run:
"eksctl get cluster", I get the following:
NAME REGION EKSCTL CREATED
my-cluster us-west-2 True
unique-outfit-1636757727 us-west-2 True
I then try the next line "eksctl get nodegroup --cluster my-cluster" and get:
2021-11-15 15:31:14 [ℹ] eksctl version 0.73.0
2021-11-15 15:31:14 [ℹ] using region us-west-2
Error: No nodegroups found
I'm desperate to try and scale down the cluster, but stuck in the above command.
Seems everything installed and is running as intended, but the management part is failing! Thanks in advance! What am I doing wrong?
Reference --
eksctl get cluster
eksctl get nodegroup --cluster CLUSTERNAME
eksctl scale nodegroup --cluster CLUSTERNAME --name NODEGROUPNAME --nodes NEWSIZE
To completely scale down the nodes to zero use this (max=0 threw errors):
eksctl scale nodegroup --cluster CLUSTERNAME --name NODEGROUPNAME --nodes 0 --nodes-max 1 --nodes-min 0
You don't have managed node group therefore eksctl does not return any node group result. The same applies to aws eks cli.
...scaling down my cluster...
You can logon to the console, goto EC2->Auto Scaling Groups, locate the launch template and scale by updating the "Group details". Depends on how your cluster was created, you can look for the launch template tag kubernetes.io/cluster/<your cluster name> to find the correct template.

ecs-cli refers to old cluster after changing default profile; doesn't show EC2 instances

I've been using AWS's ECS CLI to spin clusters of EC2 instances up and down for various tasks. The problem I'm running into is that it seems to be referring to old information that I don't know how to change.
e.g., I just created a cluster, my-second-cluster successfully, and can see it in the AWS console:
$ ecs-cli up --keypair "my-keypair" --capability-iam --size 4 --instance-type t2.micro --port 22 --cluster-config my-second-cluster --ecs-profile a-second-profile
INFO[0001] Using recommended Amazon Linux 2 AMI with ECS Agent 1.45.0 and Docker version 19.03.6-ce
INFO[0001] Created cluster cluster=my-second-cluster region=us-east-1
INFO[0002] Waiting for your cluster resources to be created...
INFO[0002] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
INFO[0063] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
INFO[0124] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
VPC created: vpc-123abc
Security Group created: sg-123abc
Subnet created: subnet-123abc
Subnet created: subnet-123def
Cluster creation succeeded.
...but eci-cli ps returns an error referring to an old cluster:
$ ecs-cli ps
FATA[0000] Error executing 'ps': Cluster 'my-first-cluster' is not active. Ensure that it exists
Specifying the cluster explicitly (ecs-cli ps --cluster my-second-cluster --region us-east-1) returns nothing, even though I see the 4 EC2 instances when I log into the AWS console.
Supporting details:
Before creating this second cluster, I created a second profile and set it to the default. I also set the new cluster to be the default.
$ ecs-cli configure profile --access-key <MY_ACCESS_KEY> --secret-key <MY_SECRET_KEY> --profile-name a-second-profile
$ ecs-cli configure profile default --profile-name a-second-profile
$ ecs-cli configure --cluster my-second-cluster --region us-east-1
INFO[0000] Saved ECS CLI cluster configuration default.
It's unclear to me where these ECS profile and cluster configs are stored (I'd expect to see them as files in ~/.aws, but no), or how to manipulate them beyond the cli commands that don't give great feedback. Any ideas on what I'm missing?
The ECS CLI stores it's credentials at ~/.ecs/credentials.
When you create the initial profile it's name is default and is used by default. When you set a-second-profile to default, it sets the metadata to use a-second-profile by default but you still have a profile named default that points to the original creds.
My guess is that to see the first cluster you need to now specify a profile name since you changed the default. If you didn't give your initial profile a name then it will be default.
ecs-cli ps --ecs-profile default
If you deleted your cluster configuration you may need to add the cluster again and associate to the right profile:
ecs-cli configure --cluster cluster_name --default-launch-type launch_type --region region_name --config-name configuration_name
I hope that makes sense. Hopefully looking at how your commands update ~/.ecs/credentials be helpful.
Some resources:
ECS CLI Configurations

Deleting EKS Cluster with eksctl not working properly, requires manual deletion of resources such as ManagedNodeGroups

I'm running a cluster on EKS, and following the tutorial to deploy one using the command eksctl create cluster --name prod --version 1.17 --region eu-west-1 --nodegroup-name standard-workers --node-type t3.medium --nodes 3 --nodes-min 1 --nodes-max 4 --ssh-access --ssh-public-key public-key.pub --managed.
Once I'm done with my tests (mainly installing and then uninstalling helm charts), and i have a clean cluster with no jobs running, i then try to delete it with eksctl delete cluster --name prod, causing these errors.
[ℹ] eksctl version 0.25.0
[ℹ] using region eu-west-1
[ℹ] deleting EKS cluster "test"
[ℹ] deleted 0 Fargate profile(s)
[✔] kubeconfig has been updated
[ℹ] cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
[ℹ] 2 sequential tasks: { delete nodegroup "standard-workers", delete cluster control plane "test" [async] }
[ℹ] will delete stack "eksctl-test-nodegroup-standard-workers"
[ℹ] waiting for stack "eksctl-test-nodegroup-standard-workers" to get deleted
[✖] unexpected status "DELETE_FAILED" while waiting for CloudFormation stack "eksctl-test-nodegroup-standard-workers"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[✖] AWS::CloudFormation::Stack/eksctl-test-nodegroup-standard-workers: DELETE_FAILED – "The following resource(s) failed to delete: [ManagedNodeGroup]. "
[✖] AWS::EKS::Nodegroup/ManagedNodeGroup: DELETE_FAILED – "Nodegroup standard-workers failed to stabilize: [{Code: Ec2SecurityGroupDeletionFailure,Message: DependencyViolation - resource has a dependent object,ResourceIds: [[REDACTED]]}]"
[ℹ] 1 error(s) occurred while deleting cluster with nodegroup(s)
[✖] waiting for CloudFormation stack "eksctl-test-nodegroup-standard-workers": ResourceNotReady: failed waiting for successful resource state
To fix them I had to manually delete AWS VPCs and then ManagednodeGroups, to then delete everything again.
I tried again with the steps above (creating and deleting with the commands provided in the official getting started documentation), but I get the same errors upon deleting.
It seems extremely weird that I have to manually delete resources when doing something like this. Is there a fix for this problem, am i doing something wrong, or is this standard procedure?
All commands are run through the official eksctl cli, and I'm following the official eksctl deployment
If we try to delete the corresponding Security Group to which the Node Group EC2 is attached to, we will find the root cause.
Mostly it will say there is a Network Interface attached.
So the solution is to delete that linked Network Interface manually. Now the Node Group will be deleted without any error.
If you are using Managed Node Groups and public subnets, be sure that you update your subnet settings to map public IPs on launch before April 20 April 22. You can follow the progress of the updates to managed node groups on our GitHub roadmap.
If you want to learn more about networking configurations and IP assignment for EKS clusters, check blog on cluster networking for worker nodes.
Also you can try:
Go to EC2 > Network Interfaces
Sort by VPC, find the interfaces assigned to your VPC
The interface to delete should be the only one that is "available", it should also be the only one assigned to the problematic remote access SG. If more than one interface matches this description, delete them all.
Take a look: eks-managed-node-groups, eksctl-node-group.
Have you tried running the eksctl delete cluster command with the --wait flag?
Without that flag it will output a message that it is deleted but deletion activities are still going on in the background.

Getting error while creating ekscluster with the same name

I have created ekscluster with a name called "prod". I worked on this "prod" cluster after that i have deleted it. I have deleted all its associated vpc, interfaces, security groups everything. But if i try to create the ekscluster with the same name "prod" am getting this below error. Can you please help me on this issue?
[centos#ip-172-31-23-128 ~]$ eksctl create cluster --name prod
--region us-east-2 [ℹ] eksctl version 0.13.0 [ℹ] using region us-east-2 [ℹ] setting availability zones to [us-east-2b us-east-2c us-east-2a] [ℹ] subnets for us-east-2b - public:192.168.0.0/19 private:192.168.96.0/19 [ℹ] subnets for us-east-2c - public:192.168.32.0/19 private:192.168.128.0/19 [ℹ] subnets for us-east-2a - public:192.168.64.0/19 private:192.168.160.0/19 [ℹ] nodegroup "ng-1902b9c1" will use "ami-080fbb09ee2d4d3fa" [AmazonLinux2/1.14] [ℹ] using Kubernetes version 1.14 [ℹ] creating EKS cluster "prod" in "us-east-2" region with un-managed nodes [ℹ] will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks
--region=us-east-2 --cluster=prod' [ℹ] CloudWatch logging will not be enabled for cluster "prod" in "us-east-2" [ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=us-east-2
--cluster=prod' [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "prod" in "us-east-2" [ℹ] 2 sequential tasks: { create cluster control plane "prod", create nodegroup "ng-1902b9c1" } [ℹ] building cluster stack "eksctl-prod-cluster" [ℹ] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console [ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-east-2
--name=prod' [✖] creating CloudFormation stack "eksctl-prod-cluster": AlreadyExistsException: Stack [eksctl-prod-cluster] already exists status code: 400, request id: 49258141-e03a-42af-ba8a-3fef9176063e Error: failed to create cluster "prod"
There are two things to consider here.
The delete command does not wait for all the resources to actually be gone. You should add the --wait flag in order to let it finish. It usually it takes around 10-15 mins.
If that is still not enough you should make sure that you delete the CloudFormation object. It would look something like this (adjust the naming):
#delete cluster:
-delete cloudformation stack
aws cloudformation list-stacks --query StackSummaries[].StackName
aws cloudformation delete-stack --stack-name worker-node-stack
aws eks delete-cluster --name EKStestcluster
Please let me know if that helped.
I was struggling with this error while Running EKS via Terraform - I'll share my solution hopefully it will save other some valuable time.
I tried to follow the references below but same result.
Also I tried to setup different timeouts for delete and create - still didn't help.
Finally I was able to resolve this when I changed the create_before_destroy value inside the lifecycle block to false:
lifecycle {
create_before_destroy = false
}
(*) Notice - pods are still running on cluster during the update.
References:
Non-default node_group name breaks node group version upgrade
Changing tags causes node groups to be replaced