Why did eksctl create iamserviceaccount failed? waiter state transitioned to Failure - amazon-web-services

I am running command
eksctl create iamserviceaccount --name efs-csi-controller-sa --namespace kube-system --cluster mmpana --attach-policy-arn arn:aws:iam::12345678:policy/EKS_EFS_CSI_Driver_Policy --approve --override-existing-serviceaccounts --region us-east-1
I got error
2023-02-07 13:36:36 [ℹ] 1 error(s) occurred and IAM Role stacks haven't been created properly, you may wish to check CloudFormation console
2023-02-07 13:36:36 [✖] waiter state transitioned to Failure
Then I checked Cloudformation stacks
and
I upgraded eksctl yesterday
eksctl version
0.128.0
I am looking now at my policy
How to fix this?

Related

How to check why my service account is not created? 1 existing iamserviceaccount

My command to create
eksctl create iamserviceaccount --name efs-csi-controller-sa --namespace kube-system --cluster ciga --attach-policy-arn arn:aws:iam::$xxxxxx:policy/EKS_EFS_CSI_Driver_Policy --approve --override-existing-serviceaccounts --region us-east-1
I got
2023-02-04 18:12:40 [ℹ] 1 existing iamserviceaccount(s) (kube-system/efs-csi-controller-sa) will be excluded
2023-02-04 18:12:40 [ℹ] 1 iamserviceaccount (kube-system/efs-csi-controller-sa) was excluded (based on the include/exclude rules)
2023-02-04 18:12:40 [!] metadata of serviceaccounts that exist in Kubernetes will be updated, as --override-existing-serviceaccounts was set
2023-02-04 18:12:40 [ℹ] no tasks
I can not find it in kube-system
kubectl get serviceaccount efs-csi-controller-sa -n kube-system -o yaml
Error from server (NotFound): serviceaccounts "efs-csi-controller-sa" not found
What is wrong with my eksctl create command?

Container Insights on Amazon EKS AccessDeniedException

I'm trying to add a Container Insight to my EKS cluster but running into a bit of an issue when deploying. According to my logs, I'm getting the following:
[error] [output:cloudwatch_logs:cloudwatch_logs.2] CreateLogGroup API responded with error='AccessDeniedException'
[error] [output:cloudwatch_logs:cloudwatch_logs.2] Failed to create log group
The strange part about this is the role it seems to be assuming is the same role found within my EC2 worker nodes rather than the role for the service account I have created. I'm creating the service account and can see it within AWS successfully using the following command:
eksctl create iamserviceaccount --region ${env:AWS_DEFAULT_REGION} --name cloudwatch-agent --namespace amazon-cloudwatch --cluster ${env:CLUSTER_NAME} --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy --override-existing-serviceaccounts --approve
Despite the serviceaccount being created successfully, I continue to get my AccessDeniedException.
One thing I found was the logs work fine when I manually add the CloudWatchAgentServerPolicy to my worker nodes, however this is not the implementation I would like and instead would rather have an automative way of adding the service account and not touching the worker nodes directly if possible. The steps I followed can be found at the bottom of this documentation.
Thanks so much!
For anyone running into this issue: within the quickstart yaml, there is a fluent-bit service account that must be removed from that file and created manually. For me I created it using the following command:
eksctl create iamserviceaccount --region ${env:AWS_DEFAULT_REGION} --name fluent-bit --namespace amazon-cloudwatch --cluster ${env:CLUSTER_NAME} --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy --override-existing-serviceaccounts --approve
Upon running this command and removing the fluent-bit service account from the yaml, delete and reapply al your amazon-cloudwatch namespace items and it should be working.

unable to get nodegroup info using eskctl

Total noob and have a runaway EKS cluster adding up $$ on AWS.
I'm having a tough time scaling down my cluster ad not sure what to do. I'm following the recommendations here: How to stop AWS EKS Worker Instances reference below
If I run:
"eksctl get cluster", I get the following:
NAME REGION EKSCTL CREATED
my-cluster us-west-2 True
unique-outfit-1636757727 us-west-2 True
I then try the next line "eksctl get nodegroup --cluster my-cluster" and get:
2021-11-15 15:31:14 [ℹ] eksctl version 0.73.0
2021-11-15 15:31:14 [ℹ] using region us-west-2
Error: No nodegroups found
I'm desperate to try and scale down the cluster, but stuck in the above command.
Seems everything installed and is running as intended, but the management part is failing! Thanks in advance! What am I doing wrong?
Reference --
eksctl get cluster
eksctl get nodegroup --cluster CLUSTERNAME
eksctl scale nodegroup --cluster CLUSTERNAME --name NODEGROUPNAME --nodes NEWSIZE
To completely scale down the nodes to zero use this (max=0 threw errors):
eksctl scale nodegroup --cluster CLUSTERNAME --name NODEGROUPNAME --nodes 0 --nodes-max 1 --nodes-min 0
You don't have managed node group therefore eksctl does not return any node group result. The same applies to aws eks cli.
...scaling down my cluster...
You can logon to the console, goto EC2->Auto Scaling Groups, locate the launch template and scale by updating the "Group details". Depends on how your cluster was created, you can look for the launch template tag kubernetes.io/cluster/<your cluster name> to find the correct template.

Where can I view service account created by `eksctl`?

I create a EKS cluster in AWS and use this command to create a service account eksctl create iamserviceaccount --name alb-ingress-controller --cluster $componentName --attach-policy-arn $serviceRoleArn --approve --override-existing-serviceaccounts.
The output of the command is:
[ℹ] using region ap-southeast-2
[ℹ] 1 existing iamserviceaccount(s) (default/alb-ingress-controller) will be excluded
[ℹ] 1 iamserviceaccount (default/alb-ingress-controller) was excluded (based on the include/exclude rules)
[!] metadata of serviceaccounts that exist in Kubernetes will be updated, as --override-existing-serviceaccounts was set
[ℹ] no tasks
I am not sure whether it is created successfully or not.
I use this command eksctl get iamserviceaccount to verify the result but get an error response:
Error: getting iamserviceaccounts: no output "Role1" in stack "eksctl-monitor-addon-iamserviceaccount-default-alb-ingress-controller"
I also tried to run kubectl get serviceaccount but I got the error: Error from server (NotFound): serviceaccounts "alb-ingress-controller" not found.
Does this mean the service account failed to create? Where can I view the service account in AWS console? or where can I view the error?
As per the error, it means serviceaccount already exists.
For getting the service account use kubectl
kubectl get serviceaccount <SERVICE_ACCOUNT_NAME> -n kube-system -o yaml
The order is, create the IAM-role, and after that – RBAC Role and binding.
Below is command in case you want to override the existing serviceaccount
eksctl --profile <PROFILE_NAME> \
--region=ap-northeast-2 \
create iamserviceaccount \
--name alb-ingress-controller \
--namespace kube-system \
--override-existing-serviceaccounts \
--approve --cluster <CLUSTER_NAME> \
--attach-policy-arn \
arn:aws:iam::ACCOUNT_ID:policy/ALBIngressControllerIAMPolicy
I found this workshop Amazon EKS Workshop very helpful during my venture into EKS.
More information pertaining to ALB can be found here
EDIT
from this error
[ℹ] 1 existing iamserviceaccount(s) (default/alb-ingress-controller) will be excluded
It seems like the service accounts is created inside the default namespace.
so the command to check the serviceaccount will be
kubectl get serviceaccount <SERVICE_ACCOUNT_NAME> -n default-o yaml
eksctl uses CloudFormation to create the resources so you probably will find the cause of the error there.
Go to CloudFormation console in AWS
Find the stack with the name eksctl-[CLUSTER NAME]-addon-iamserviceaccount-default-[SERVICE ACCOUNT NAME], it should have the ROLLBACK_COMPLETE status.
Select the "events" tab and scroll to the first error
In my case, the cause was a missing policy that I was attaching to the role.
works as expected, thanks #samtoddler ! 😎
1 Create the IAM policy for the IAM Role 👏
curl -o iam_policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.3.0/docs/install/iam_policy.json
aws-vault exec Spryker-Humanetic-POC -- aws iam create-policy \
--policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file://iam_policy.json
2 Create the IAM role and attach it to the newly created ServiceAccount 👏
eksctl create iamserviceaccount \
--cluster education-eks-7yby62S7 \
--namespace kube-system \
--name aws-load-balancer-controller \
--attach-policy-arn arn:aws:iam::ACCOUNT_ID:policy/AWSLoadBalancerControllerIAMPolicy \
--approve
3.1 Verify # 1 that ServiceAccount lives in --namespace kube-system 👏
kubectl get sa aws-load-balancer-controller -n kube-system -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/eksctl-education-eks-7yby62S7-addon-iamservi-Role1-126OXTBKF3WBM
creationTimestamp: "2021-12-12T17:38:48Z"
labels:
app.kubernetes.io/managed-by: eksctl
name: aws-load-balancer-controller
namespace: kube-system
resourceVersion: "686442"
uid: 895f6f34-ab04-4bca-aeac-1b6b75766546
secrets:
- name: aws-load-balancer-controller-token-gcd5c
3.2 Verify #2 👏
kubectl get sa aws-load-balancer-controller -n kube-system
NAME SECRETS AGE
aws-load-balancer-controller 1 123m
Hope it will help! 🧗🏼‍♀️

Getting error while creating ekscluster with the same name

I have created ekscluster with a name called "prod". I worked on this "prod" cluster after that i have deleted it. I have deleted all its associated vpc, interfaces, security groups everything. But if i try to create the ekscluster with the same name "prod" am getting this below error. Can you please help me on this issue?
[centos#ip-172-31-23-128 ~]$ eksctl create cluster --name prod
--region us-east-2 [ℹ] eksctl version 0.13.0 [ℹ] using region us-east-2 [ℹ] setting availability zones to [us-east-2b us-east-2c us-east-2a] [ℹ] subnets for us-east-2b - public:192.168.0.0/19 private:192.168.96.0/19 [ℹ] subnets for us-east-2c - public:192.168.32.0/19 private:192.168.128.0/19 [ℹ] subnets for us-east-2a - public:192.168.64.0/19 private:192.168.160.0/19 [ℹ] nodegroup "ng-1902b9c1" will use "ami-080fbb09ee2d4d3fa" [AmazonLinux2/1.14] [ℹ] using Kubernetes version 1.14 [ℹ] creating EKS cluster "prod" in "us-east-2" region with un-managed nodes [ℹ] will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks
--region=us-east-2 --cluster=prod' [ℹ] CloudWatch logging will not be enabled for cluster "prod" in "us-east-2" [ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=us-east-2
--cluster=prod' [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "prod" in "us-east-2" [ℹ] 2 sequential tasks: { create cluster control plane "prod", create nodegroup "ng-1902b9c1" } [ℹ] building cluster stack "eksctl-prod-cluster" [ℹ] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console [ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-east-2
--name=prod' [✖] creating CloudFormation stack "eksctl-prod-cluster": AlreadyExistsException: Stack [eksctl-prod-cluster] already exists status code: 400, request id: 49258141-e03a-42af-ba8a-3fef9176063e Error: failed to create cluster "prod"
There are two things to consider here.
The delete command does not wait for all the resources to actually be gone. You should add the --wait flag in order to let it finish. It usually it takes around 10-15 mins.
If that is still not enough you should make sure that you delete the CloudFormation object. It would look something like this (adjust the naming):
#delete cluster:
-delete cloudformation stack
aws cloudformation list-stacks --query StackSummaries[].StackName
aws cloudformation delete-stack --stack-name worker-node-stack
aws eks delete-cluster --name EKStestcluster
Please let me know if that helped.
I was struggling with this error while Running EKS via Terraform - I'll share my solution hopefully it will save other some valuable time.
I tried to follow the references below but same result.
Also I tried to setup different timeouts for delete and create - still didn't help.
Finally I was able to resolve this when I changed the create_before_destroy value inside the lifecycle block to false:
lifecycle {
create_before_destroy = false
}
(*) Notice - pods are still running on cluster during the update.
References:
Non-default node_group name breaks node group version upgrade
Changing tags causes node groups to be replaced