Getting error while creating ekscluster with the same name - amazon-web-services

I have created ekscluster with a name called "prod". I worked on this "prod" cluster after that i have deleted it. I have deleted all its associated vpc, interfaces, security groups everything. But if i try to create the ekscluster with the same name "prod" am getting this below error. Can you please help me on this issue?
[centos#ip-172-31-23-128 ~]$ eksctl create cluster --name prod
--region us-east-2 [ℹ] eksctl version 0.13.0 [ℹ] using region us-east-2 [ℹ] setting availability zones to [us-east-2b us-east-2c us-east-2a] [ℹ] subnets for us-east-2b - public:192.168.0.0/19 private:192.168.96.0/19 [ℹ] subnets for us-east-2c - public:192.168.32.0/19 private:192.168.128.0/19 [ℹ] subnets for us-east-2a - public:192.168.64.0/19 private:192.168.160.0/19 [ℹ] nodegroup "ng-1902b9c1" will use "ami-080fbb09ee2d4d3fa" [AmazonLinux2/1.14] [ℹ] using Kubernetes version 1.14 [ℹ] creating EKS cluster "prod" in "us-east-2" region with un-managed nodes [ℹ] will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks
--region=us-east-2 --cluster=prod' [ℹ] CloudWatch logging will not be enabled for cluster "prod" in "us-east-2" [ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=us-east-2
--cluster=prod' [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "prod" in "us-east-2" [ℹ] 2 sequential tasks: { create cluster control plane "prod", create nodegroup "ng-1902b9c1" } [ℹ] building cluster stack "eksctl-prod-cluster" [ℹ] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console [ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-east-2
--name=prod' [✖] creating CloudFormation stack "eksctl-prod-cluster": AlreadyExistsException: Stack [eksctl-prod-cluster] already exists status code: 400, request id: 49258141-e03a-42af-ba8a-3fef9176063e Error: failed to create cluster "prod"

There are two things to consider here.
The delete command does not wait for all the resources to actually be gone. You should add the --wait flag in order to let it finish. It usually it takes around 10-15 mins.
If that is still not enough you should make sure that you delete the CloudFormation object. It would look something like this (adjust the naming):
#delete cluster:
-delete cloudformation stack
aws cloudformation list-stacks --query StackSummaries[].StackName
aws cloudformation delete-stack --stack-name worker-node-stack
aws eks delete-cluster --name EKStestcluster
Please let me know if that helped.

I was struggling with this error while Running EKS via Terraform - I'll share my solution hopefully it will save other some valuable time.
I tried to follow the references below but same result.
Also I tried to setup different timeouts for delete and create - still didn't help.
Finally I was able to resolve this when I changed the create_before_destroy value inside the lifecycle block to false:
lifecycle {
create_before_destroy = false
}
(*) Notice - pods are still running on cluster during the update.
References:
Non-default node_group name breaks node group version upgrade
Changing tags causes node groups to be replaced

Related

add karpenter to eksctl config file, but no upgrade

I have created the EKS cluster.
Then follow the document (https://eksctl.io/usage/eksctl-karpenter/) to add karpenter support,
metadata:
name: eks-dev
region: ap-southeast-2
version: "1.22"
+ tags:
+ karpenter.sh/discovery: eks-dev
+iam:
+ withOIDC: true # required
+karpenter:
+ version: '0.9.0'
managedNodeGroups:
- name: spot
but when I upgrade it, nothing happen.
$ eksctl upgrade cluster -f eks-dev.yaml --approve
2022-06-07 21:08:25 [!] NOTE: cluster VPC (subnets, routing & NAT Gateway) configuration changes are not yet implemented
2022-06-07 21:08:25 [ℹ] no cluster version update required
2022-06-07 21:08:26 [ℹ] re-building cluster stack "eksctl-eks-dev-cluster"
2022-06-07 21:08:26 [✔] all resources in cluster stack "eksctl-eks-dev-cluster" are up-to-date
2022-06-07 21:08:26 [ℹ] checking security group configuration for all nodegroups
2022-06-07 21:08:26 [ℹ] all nodegroups have up-to-date cloudformation templates
$
The note is about to igonre the change for VPC, but Karpenter change is not related to vpc.
So how can I fix this issue?
Support for karpenter only applies to new cluster, it has no effect to existing cluster. You can manually install karpenter on existing cluster following this guide.

eksctl create cluster failed

I'm trying to create a eks cluster following the workshop tutorial
https://www.eksworkshop.com/030_eksctl/launcheks/
yet eksctl create cluster failed, here's the log showed in the terminal
ec2-user:~/environment $ eksctl create cluster -f eksworkshop.yaml
2022-04-26 05:08:30 [!] SSM is now enabled by default; `ssh.enableSSM` is deprecated and will be removed in a future release
2022-04-26 05:08:30 [ℹ] eksctl version 0.94.0
2022-04-26 05:08:30 [ℹ] using region us-west-1
2022-04-26 05:08:30 [ℹ] subnets for us-west-1b - public:192.168.0.0/19 private:192.168.96.0/19
2022-04-26 05:08:30 [ℹ] subnets for us-west-1c - public:192.168.32.0/19 private:192.168.128.0/19
2022-04-26 05:08:30 [ℹ] subnets for - public:192.168.64.0/19 private:192.168.160.0/19
2022-04-26 05:08:30 [ℹ] nodegroup "nodegroup" will use "" [AmazonLinux2/1.19]
2022-04-26 05:08:30 [ℹ] using Kubernetes version 1.19
2022-04-26 05:08:30 [ℹ] creating EKS cluster "eksworkshop-eksctl" in "us-west-1" region with managed nodes
2022-04-26 05:08:30 [ℹ] 1 nodegroup (nodegroup) was included (based on the include/exclude rules)
2022-04-26 05:08:30 [ℹ] will create a CloudFormation stack for cluster itself and 0 nodegroup stack(s)
2022-04-26 05:08:30 [ℹ] will create a CloudFormation stack for cluster itself and 1 managed nodegroup stack(s)
2022-04-26 05:08:30 [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-1 --cluster=eksworkshop-eksctl'
2022-04-26 05:08:30 [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "eksworkshop-eksctl" in "us-west-1"
2022-04-26 05:08:30 [ℹ] CloudWatch logging will not be enabled for cluster "eksworkshop-eksctl" in "us-west-1"
2022-04-26 05:08:30 [ℹ] you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-west-1 --cluster=eksworkshop-eksctl'
2022-04-26 05:08:30 [ℹ]
2 sequential tasks: { create cluster control plane "eksworkshop-eksctl",
2 sequential sub-tasks: {
wait for control plane to become ready,
create managed nodegroup "nodegroup",
}
}
2022-04-26 05:08:30 [ℹ] building cluster stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:08:30 [ℹ] deploying stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:09:00 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:09:30 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:10:30 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:11:30 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:12:30 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:13:30 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:14:30 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:15:30 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:16:31 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:17:31 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:18:31 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:19:31 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:20:31 [ℹ] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-cluster"
2022-04-26 05:20:31 [!] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
2022-04-26 05:20:31 [ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-west-1 --name=eksworkshop-eksctl'
2022-04-26 05:20:31 [✖] getting stack "eksctl-eksworkshop-eksctl-cluster" outputs: couldn't import subnet subnet-06ea5af280253e579: subnet ID "subnet-0068d4ea9652c80bc" is not the same as "subnet-06ea5af280253e579"
Error: failed to create cluster "eksworkshop-eksctl"
What is the potential issue for this? The IAM role is valid
ec2-user:~/environment $ aws sts get-caller-identity --query Arn | grep eksworkshop-admin -q && echo "IAM role valid" || echo "IAM role NOT valid"
IAM role valid
my yaml file:
enter image description here

unable to get nodegroup info using eskctl

Total noob and have a runaway EKS cluster adding up $$ on AWS.
I'm having a tough time scaling down my cluster ad not sure what to do. I'm following the recommendations here: How to stop AWS EKS Worker Instances reference below
If I run:
"eksctl get cluster", I get the following:
NAME REGION EKSCTL CREATED
my-cluster us-west-2 True
unique-outfit-1636757727 us-west-2 True
I then try the next line "eksctl get nodegroup --cluster my-cluster" and get:
2021-11-15 15:31:14 [ℹ] eksctl version 0.73.0
2021-11-15 15:31:14 [ℹ] using region us-west-2
Error: No nodegroups found
I'm desperate to try and scale down the cluster, but stuck in the above command.
Seems everything installed and is running as intended, but the management part is failing! Thanks in advance! What am I doing wrong?
Reference --
eksctl get cluster
eksctl get nodegroup --cluster CLUSTERNAME
eksctl scale nodegroup --cluster CLUSTERNAME --name NODEGROUPNAME --nodes NEWSIZE
To completely scale down the nodes to zero use this (max=0 threw errors):
eksctl scale nodegroup --cluster CLUSTERNAME --name NODEGROUPNAME --nodes 0 --nodes-max 1 --nodes-min 0
You don't have managed node group therefore eksctl does not return any node group result. The same applies to aws eks cli.
...scaling down my cluster...
You can logon to the console, goto EC2->Auto Scaling Groups, locate the launch template and scale by updating the "Group details". Depends on how your cluster was created, you can look for the launch template tag kubernetes.io/cluster/<your cluster name> to find the correct template.

Deleting EKS Cluster with eksctl not working properly, requires manual deletion of resources such as ManagedNodeGroups

I'm running a cluster on EKS, and following the tutorial to deploy one using the command eksctl create cluster --name prod --version 1.17 --region eu-west-1 --nodegroup-name standard-workers --node-type t3.medium --nodes 3 --nodes-min 1 --nodes-max 4 --ssh-access --ssh-public-key public-key.pub --managed.
Once I'm done with my tests (mainly installing and then uninstalling helm charts), and i have a clean cluster with no jobs running, i then try to delete it with eksctl delete cluster --name prod, causing these errors.
[ℹ] eksctl version 0.25.0
[ℹ] using region eu-west-1
[ℹ] deleting EKS cluster "test"
[ℹ] deleted 0 Fargate profile(s)
[✔] kubeconfig has been updated
[ℹ] cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
[ℹ] 2 sequential tasks: { delete nodegroup "standard-workers", delete cluster control plane "test" [async] }
[ℹ] will delete stack "eksctl-test-nodegroup-standard-workers"
[ℹ] waiting for stack "eksctl-test-nodegroup-standard-workers" to get deleted
[✖] unexpected status "DELETE_FAILED" while waiting for CloudFormation stack "eksctl-test-nodegroup-standard-workers"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[✖] AWS::CloudFormation::Stack/eksctl-test-nodegroup-standard-workers: DELETE_FAILED – "The following resource(s) failed to delete: [ManagedNodeGroup]. "
[✖] AWS::EKS::Nodegroup/ManagedNodeGroup: DELETE_FAILED – "Nodegroup standard-workers failed to stabilize: [{Code: Ec2SecurityGroupDeletionFailure,Message: DependencyViolation - resource has a dependent object,ResourceIds: [[REDACTED]]}]"
[ℹ] 1 error(s) occurred while deleting cluster with nodegroup(s)
[✖] waiting for CloudFormation stack "eksctl-test-nodegroup-standard-workers": ResourceNotReady: failed waiting for successful resource state
To fix them I had to manually delete AWS VPCs and then ManagednodeGroups, to then delete everything again.
I tried again with the steps above (creating and deleting with the commands provided in the official getting started documentation), but I get the same errors upon deleting.
It seems extremely weird that I have to manually delete resources when doing something like this. Is there a fix for this problem, am i doing something wrong, or is this standard procedure?
All commands are run through the official eksctl cli, and I'm following the official eksctl deployment
If we try to delete the corresponding Security Group to which the Node Group EC2 is attached to, we will find the root cause.
Mostly it will say there is a Network Interface attached.
So the solution is to delete that linked Network Interface manually. Now the Node Group will be deleted without any error.
If you are using Managed Node Groups and public subnets, be sure that you update your subnet settings to map public IPs on launch before April 20 April 22. You can follow the progress of the updates to managed node groups on our GitHub roadmap.
If you want to learn more about networking configurations and IP assignment for EKS clusters, check blog on cluster networking for worker nodes.
Also you can try:
Go to EC2 > Network Interfaces
Sort by VPC, find the interfaces assigned to your VPC
The interface to delete should be the only one that is "available", it should also be the only one assigned to the problematic remote access SG. If more than one interface matches this description, delete them all.
Take a look: eks-managed-node-groups, eksctl-node-group.
Have you tried running the eksctl delete cluster command with the --wait flag?
Without that flag it will output a message that it is deleted but deletion activities are still going on in the background.

ecs-cli up does not create ec2 instance

I'm trying to launch an ECS cluster using the cli, but get stuck on EC2 instances not being created.
I've configured my ecs credentials, adding all the missing permissions extracted from the CloudFormation errors - at least I don't see any additional errors now. I've configured a simple cluster configuration.
~/.ecs/config
clusters:
mycluster:
cluster: mycluster
region: eu-north-1
default_launch_type: EC2
And this is the cli command I run;
ecs-cli up --keypair myKeyPair --capability-iam \
--size 1 --instance-type t2.micro \
--cluster-config mycluster --cluster mycluster \
--launch-type EC2 --force --verbose
I get no error messages, the cluster is created, but I see no instances connected to it, and especially no instances on EC2.
This is the output from the cli command;
INFO[0000] Using recommended Amazon Linux 2 AMI with ECS Agent 1.29.1 and Docker version 18.06.1-ce
INFO[0000] Created cluster cluster=mycluster region=eu-north-1
INFO[0000] Waiting for your CloudFormation stack resources to be deleted...
INFO[0000] Cloudformation stack status stackStatus=DELETE_IN_PROGRESS
DEBU[0030] Cloudformation stack status stackStatus=DELETE_IN_PROGRESS
DEBU[0061] Cloudformation create stack call succeeded stackId=0xc00043ab11
INFO[0061] Waiting for your cluster resources to be created...
DEBU[0061] parsing event eventStatus=CREATE_IN_PROGRESS resource="arn:aws:cloudformation:eu-north-1:999987631111:stack/amazon-ecs-cli-setup-mycluster/11111111-aba2-11e9-ac3c-0e40cf291592"
INFO[0061] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
DEBU[0091] parsing event eventStatus=CREATE_IN_PROGRESS resource=subnet-0cc4a3aa110555d42
DEBU[0091] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
DEBU[0121] parsing event eventStatus=CREATE_IN_PROGRESS resource=rtbassoc-05c185a5aa11ca22e
INFO[0121] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
DEBU[0151] parsing event eventStatus=CREATE_COMPLETE resource=rtbassoc-05c185a5aa11ca22e
DEBU[0151] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
DEBU[0181] parsing event eventStatus=CREATE_COMPLETE resource=rtbassoc-05c185a5aa11ca22e
INFO[0181] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
DEBU[0212] parsing event eventStatus=CREATE_COMPLETE resource=amazon-ecs-cli-setup-mycluster-EcsInstanceProfile-1KS4Q3W9HAAAA
DEBU[0212] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
DEBU[0242] parsing event eventStatus=CREATE_COMPLETE resource="arn:aws:cloudformation:eu-north-1:999987631111:stack/amazon-ecs-cli-setup-mycluster/11111111-aba2-11e9-ac3c-0e40cf291592"
VPC created: vpc-033f7a6fedfee256d
Security Group created: sg-0e4461f781bad6681
Subnet created: subnet-0cc4a3aa110555d42
Subnet created: subnet-0a4797072dc9641d2
Cluster creation succeeded.
Running describe-clusters after a couple of hours;
aws ecs describe-clusters --clusters mycluster --region eu-north-1
gives the following output;
{
"clusters": [
{
"status": "ACTIVE",
"statistics": [],
"tags": [],
"clusterName": "mycluster",
"registeredContainerInstancesCount": 0,
"pendingTasksCount": 0,
"runningTasksCount": 0,
"activeServicesCount": 0,
"clusterArn": "arn:aws:ecs:eu-north-1:999987631111:cluster/mycluster"
}
],
"failures": []
}
Does anyone know what I might be missing? I've not hit any limits, since I've only got 1 other running instance (on a different region).